v0.7.0 attested-cortex · NHI Test (Round 1 + Round 2 + Round 3 verification)

v0.7.0 NHI Test — Round 3 verifies all 13 findings GREEN on rebuilt binary

Round 1: 12 phases · Round 2: 5 parallel sub-agents · Round 3: 3 parallel sub-agents + orchestrator-direct fresh-binary verification

Round 1SHIP-WITH-NOTES
Round 2 (historical)F6/F7/F8 · resolved in f02d092
Round 2 fixes13/13 · PR #643
Round 3 verifyGREEN · commit f02d092
Round 4 sweep0 regressions · 2 yellow
Round 4 fixes4/4 · commit 5b36d7c

Round 1 ran the 12-phase NHI playbook (114 PASS / 6 partial / 12 config-gap / 0 fail; three P2 fixes shipped on PR #636). Round 2 dispatched 5 parallel sub-agents against the post-fix binary and surfaced three blockers (F6 LLM-dispatch deadlock + silent recall degradation, F7 HTTP store path bypasses quota counters, F8 permissions.mode=advisory default disables write enforcement) plus 10 release-notes items (F9–F18). The Round-2 fix campaign dispatched 5 parallel fix-agents in worktree-isolated branches; nine commits later all 13 findings landed on PR #643. Round 3 ran fresh-binary verification (3 parallel sub-agents + orchestrator-direct probes) and surfaced two residual holes the Round-2 commits missed — F8's resolve_v07_default_mode was wired only into the banner, not the gate; F12's keypair auto-gen used a different label than the load path, leaving app.active_keypair = None. Both holes plus the F17 description gap fixed in commit f02d092 (round-2-fixes branch HEAD). Fresh-binary re-verification: permissions.mode=enforce by default, intruder write to governance.write=owner namespace returns HTTP 403, new links land with attest_level=self_signed + 64-byte signature, memory_find_paths description names the cap and undirected semantics. All four CLAUDE.md gates green on the new commit (2 973 tests / 0 fail). Tag-gate is now operator-side: review PR #643 + restart the live HTTP daemon (the running PID has the kernel-mapped pre-rebuild inode, not the patched binary).

Round 1 — tests passed
114 / 132
+ 12 config-gated gaps · 0 outright fail
Round 2 — agent aggregate
26 PASS / 4 FAIL
4 fails all gated by F6 (now fixed)
Round 1 + 2 fixes shipped
16 / 16
F2–F4 (PR #636) + F6–F18 (PR #643)
cargo test (post-fix)
2 973 / 2 984
0 failed · 11 ignored · 81 binaries green
CLAUDE.md gates
4 / 4
fmt · clippy::pedantic · test · build
Round 2 — perf baseline
33.9 / s
1000 sequential MCP stores · p99 27 ms
Round 3 — verification
9 PASS · 4 PARTIAL
all 4 PARTIAL by design (C4 trim · forward-compat · 8/10 router)
Round 3 — fresh binary
f02d092
F8 + F12 + F17 — gates green · 24 546 672 B
Round 4 — live patched daemon
F1–F18 · 0 regressions
4 parallel sub-agents · 2 yellow follow-ups (F12 runtime · smart_load)
Round 4 — perf vs R2
+4.8% throughput
35.53/s · p99 38.95 ms (mild +44% tail)
Round 4 — enforce gate
2 → 1 550
decision counter delta proves gate is live
Round 4 fixes — closed
4 / 4 + 2
smart_load veto · refuse-by-default · tools_verbose env · key_dir help · F12 retest cleared · p99 was embedder-pause artifact (no code)
Round 4 fixes — commit
5b36d7c
on round-2-fixes (PR #643) · 2 258 lib tests / 0 fail
Round 2 · multi-agent regression sweep

HOLD TAG. Three new release-blockers found in Round 2.

After Round 1 shipped post-NHI fixes on PR #636, Round 2 re-ran a verification sweep against the patched binary using 5 parallel sub-agents, each writing to its own sub-namespace under ai-memory/v0.7.0-nhi-round-2. Round-1 fixes hold (F2, F3, F4 helper all PASS). But Round 2 surfaced regressions and security-default issues that did not appear in Round 1 because that round did not stress the LLM dispatch path, the HTTP quota path, or default permission enforcement.

Recommendation: Do not cut the v0.7.0 tag until F6, F7, and F8 are either fixed in a follow-up PR or explicitly downgraded with documented release-notes guidance. F6 reproduced live by the orchestrator: memory_expand_query returns "Failed to send chat request" while ollama /api/chat responds in 5.6 s when called directly, and memory_recall silently returns mode:keyword while memory_capabilities still advertises recall_mode_active=hybrid.

Sub-agent results

Agent ARound-1 fixes verified

F2/F3/F4 fix verification + KG deep dive

Verified the three Round-1 fixes against the patched binary: entity_register persists canonical_name as alias (F2); kg_query and find_paths default to current view with include_invalidated=true opt-in (F3); GovernancePolicy::default_for_managed_namespace helper round-trips correctly via explicit set_standard/get_standard (F4). KG stress: 50 entities, 4 relation types, directed cycle, depths 1..5, find_paths max_depth=7 max_results=50, edge invalidation matrix — all clean.

PASS · 4schema-gap · 3 (tool-surface docs)
summary memory · ad88cf88-18a0-4757-9f10-bca1b69792a5
Agent BF6 surfaced · BLOCKER

Power tools + autonomous-tier exhaustive

All four LLM-backed power tools (memory_consolidate, memory_expand_query, memory_auto_tag, memory_detect_contradiction) returned "Failed to send chat request" immediately. sample on the MCP daemon (PID 69382) shows the main thread pegged at 99.3% CPU spinning in clock_gettime / mach_absolute_time (621/746 samples ≈ 83% — busy-loop in an async-runtime poll path). Ollama itself is healthy (gemma4:e4b loaded, /api/chat responds in 5.6s when called directly). memory_recall silently degraded from hybrid to keyword mode while capabilities still advertised hybrid. memory_inbox delivered 100/100 messages but with a 30-second cadence on the first 20, dropping to ~1.5s/msg after backpressure cleared. memory_check_duplicate threshold curve healthy (PASS, 1 of 6 task sections).

PASS · 1partial · 1FAIL · 4
summary memory · 01ddbac5-4105-41f5-83b3-3d19632d28b5
Agent CPASS · X-Agent-Id confirmed

Cross-interface + chaos

Re-ran the cross-interface phase using the correct X-Agent-Id header (Round 1's X-AI-Memory-Agent-Id was the test-side bug captured in the erratum). Header valid → 201 stamped, malformed → 400 with sanitized regex error, body wins precedence over header, no header + no body → anonymous:req-<uuid> fallback. Chaos: malformed JSON-RPC, missing required fields, invalid tier, content-cap edges (65535 / 65536 / 65537 bytes), 10-way concurrent racing stores → 1×201 + 9×409 CONFLICT clean, doctor INFO post-chaos. 1000 sequential stores: 33.9/s, p95 25 ms, p99 27 ms, 0 errors. Lifecycle stress (200 short + 50 mid + 20 long) → gc + archive_stats + forget by namespace/pattern/tier all behaved.

PASS · 14
summary memory · f157a805-…
Agent DF7 + F8 surfaced

Governance / security / observability

SSRF guards: all 5 reject vectors blocked with sanitized errors (loopback, link-local 169.254, RFC1918 10.0.0.0, file://, plaintext non-loopback). Subscription full lifecycle: 50 events fired → DLQ ladder → memory_subscription_replay since=24h-ago returned 107 events. Agent registry: 8 distinct agent_types registered + listed cleanly. Permission probe (PARTIAL): permissions.mode=advisory is the v0.7.0 default — non-owner writes ACCEPTED even with metadata.governance.write=owner set. Quota isolation (GAP): 500 stores via POST /api/v1/memories succeeded but memory_quota_status shows zero new rows — the HTTP path bypasses quota counters; MCP path increments correctly. Doctor INFO post-run. Daemon crashed once mid-test and was restarted with the same args.

PASS · 4partial · 1gap · 1
summary memory · c81afbb3-b1e8-499a-b1cc-ebd94c2a1300
Agent Ebudget green · 8/10 routing

Capabilities / token budget / smart_load / load_family

Capabilities matrix: 14 shapes (default / accept=v1 / accept=v2 / family=<each-of-8> / verbose / include_schema / combinations) — all returned data; per-family drill-in works. Token budget gate PASS: trimmed full = 2316 tokens (≤ 3500 ceiling, 34% headroom); max single tool = 522 (≤ 1500). Smart-load: 8 / 10 PASS — "send a notification" routed to meta (expected other/power); "expand a query and find related memories" routed to graph (expected power). memory_load_family idempotent for each of 8 families. Findings: memory_capabilities MCP inputSchema declares zero properties yet server accepts 4; include_schema=true is inert despite the v0.6.4-family-schemas-1 label; verbose=true is a no-op; doctor --tokens --json reports its own DB-handle profile, not the running MCP server's full profile.

PASS · 3partial · 1 (smart-load 8/10)
summary memory · ed99f774-9d1d-4908-b9cb-cbbee84d3a21

New Round-2 findings

Numbering note: Round-2 findings reuse the F prefix in their own sequence (F6–F18) for readability. The Round-1 historical F6 ("verbose=true docstring already correct — closed as not-a-bug", listed in the Round-1 findings section further down) is a separate item and does not conflict in scope.
F6BLOCKER

LLM-dispatch deadlock + silent recall degradation + daemon crash

The MCP daemon's main thread pegs at 99.3% CPU in a clock_gettime / mach_absolute_time busy-loop (likely a tokio task that polls without yielding), and from that point all four LLM-backed tools (consolidate, expand_query, auto_tag, detect_contradiction) return "Failed to send chat request" while ollama is independently healthy. memory_recall falls back to keyword mode silently while memory_capabilities still advertises hybrid. The daemon eventually crashed in this run and required a restart.

Status: reproduced live by the Round-2 orchestrator. Investigate the busy-loop on the main thread (sample evidence at src async-poll path), surface embed/chat status to clients, and never silently downgrade recall_mode_active.

F7BLOCKER

HTTP POST /api/v1/memories bypasses quota counters

500 stores from agent-d-quota:alpha-01 + 5 from :beta-01 via the HTTP API succeeded but memory_quota_status shows zero new rows. The same agent_id stamping a memory through the MCP path increments quota counters correctly. Quota enforcement is bypassable from any HTTP client. Regression candidate vs. v0.6.x.

Status: Wire the HTTP store handler through the same quota-increment path as MCP; verify with a regression test that pumps N stores via HTTP and asserts memory_quota_status matches.

F8SECURITY · DECISION

permissions.mode defaults to advisory in v0.7.0

Fresh deployments have NO write enforcement until an operator opts in. A namespace with metadata.governance.write=owner still accepted writes from an unrelated agent_id because advisory mode is non-blocking. If "default-secure" is the v0.7.0 promise, this is a blocker. If "advisory by default, opt-in to enforce" is documented release behavior, this is a release-notes item with a prominent README + --help + first-run-banner callout.

Status: Decide policy. Either flip default to enforce with migration notes for existing deployments, or document the advisory default explicitly in release notes, README, and the daemon's first-run UX banner.

F9–F18P3 · release-notes / v0.7.1

Ten release-notes items (non-blocking)

F9 HTTP missing-required returns 422 (axum body-extractor) not 400 — spec/doc drift. F10 Embedder timeout on >64 KB content silently produces an un-indexed row committed at HTTP 201; embed status not surfaced to clients. F11 forget --pattern X and forget --tier T without --namespace are GLOBAL deletes — no safety rail (since v0.6.x). F12 Ed25519 keypair not auto-generated on serve startup — link signing disabled by default. F13 memory_capabilities MCP inputSchema declares zero properties yet the server accepts accept / family / include_schema / verbose; verbose=true is a no-op; include_schema=true is inert despite the v0.6.4-family-schemas-1 label; accept=v1 strips schema_version (breaks v1 wire-version detection); doctor --tokens --json reports its own DB-handle profile, not the running MCP server's full profile (operator-confusion risk). F14 Smart-load router under-weights underscore tokens — "send a notification" → meta (expected other/power); "expand a query and find related memories" → graph (expected power). F15 MCP memory_store / memory_update inputSchema lacks a metadata field; governance standards must be authored via HTTP. F16 agent_type MCP enum is closed but the daemon accepts any open form — schema/server mismatch. F17 find_paths max_depth hard-capped at 7 (src/db.rs:3592); find_paths undirected vs. kg_query directed — by design, surface in tool descriptions. F18 check_duplicate similarity caps near 0.92 for byte-identical strings (embedding+normalization artifact); single-token factual mutation (date swap) at sim 0.913 — that's the contradiction tool's job.

Documented in the Round-2 verdict memory · scoped to v0.7.0 release notes or v0.7.1 follow-up.

Hard-rule compliance

Recall the Round-2 verdict

memory_recall context="v0.7.0 Round-2 verdict" namespace="ai-memory/v0.7.0-nhi-round-2"
Fix campaign · PR #643 · all blockers resolved

Five parallel fix-agents · 13 findings · 9 commits · all gates green.

After the operator's "FIX ALL OF IT — time is not a factor" go-ahead, the orchestrator dispatched 5 fix-agents in parallel (worktree-isolated where supported), each owning a strict file-bucket to prevent collisions. Findings fan-in cleanly to the matching agent; the integrator stitched γ's helpers into daemon_runtime::serve() and ε's check_duplicate_with_text into the MCP + HTTP call sites. All four CLAUDE.md gates pass on the merged branch.

Per-agent ownership and outcomes

Fix-Agent αF6 · LLM dispatch root-caused

F6 — LLM dispatch deadlock + recall consistency

Root cause: mcp::run_mcp_server is a sync stdin-reader using reqwest::blocking::Client, but it was being awaited from inside an async fn body in daemon_runtime::run. That pinned a tokio worker on the blocking stdin read (the 99.3% clock_gettime busy-loop) AND issued blocking reqwest from inside an active tokio runtime context (the "Failed to send chat request" error). Fix: wrap the MCP loop in tokio::task::spawn_blocking so it owns its own dedicated thread outside tokio's polling. Plus 5s connect timeout, 3-failure-in-30s circuit breaker (5xx + network only — 4xx doesn't trip), and the new EmbedStatus { Indexed, Skipped, Failed } enum + 64 KiB cap that β consumes for HTTP F10. compute_recall_mode already returned Hybrid based on embedder_loaded only, ignoring LLM availability — exactly the right semantic; pinned with a regression test.

commit ecdae2a · 4 files +618 / −11 · 5 tests pass in 0.12s
Fix-Agent βF7 + F9 + F10 · HTTP path

HTTP quota wiring · 400 not 422 · embed status surfaced

F7: handlers::create_memory now calls quotas::check_and_record ahead of db::insert with quotas::refund_op on insert failure — mirrors src/mcp.rs. Quota breaches return 429 with envelope { code, limit, current, max, agent_id }. Empty agent_id still bypasses (anonymous semantics preserved). F9: introduced JsonOrBadRequest<T> custom FromRequest extractor that folds every JsonRejection into 400 Bad Request + { "error": "missing required field: ...", "fields": [...] }. Identifier-charset allowlist on extracted field names prevents body content from leaking. F10: consumes α's EmbedStatus; non-Indexed outcomes add embed_status + embed_status_reason to the 201 body. Success path stays silent.

commit f9ef40a · handlers.rs +257 / −17 · 9 new tests pass
Fix-Agent γF8 + F11 + F12 · defaults & safety

Secure-by-default · forget safety rail · keypair auto-gen

F8: added default_v07_secure_mode + resolve_v07_default_mode + startup_banner_line in permissions.rs, and a new cli/serve_banner.rs with pure compose_banner(BannerInputs) → Vec<BannerLine>. Daemon's serve() body now routes Info → tracing::info!, Warn → tracing::warn! — operators see permissions: enforce + the v0.7 migration warning at boot. F11: added --confirm-global flag to forget; bails with the documented message when --namespace is absent and --pattern or --tier is set. F12: added EnsureOutcome { AlreadyExists, Generated, SkippedDisabled } + ensure_keypair helper in identity/keypair.rs — idempotent, never overwrites, integrator wires it into serve() with the well-known daemon agent_id.

commit 579afe2 · 6 files modified · 39 new tests pass
Fix-Agent δF13 + F14 + F15 + F16 · MCP surface

capabilities matrix · smart-load · store metadata · agent_type

F13: declared accept / family / include_schema / verbose in the inputSchema; wired verbose=true to emit tools[].docstring; wired include_schema=true to populate tools[].inputSchema; added effective_tier_label() overlay (one tier source of truth); reconciled "51 of 51" vs "50 memory tools" off-by-one. F14: rebalanced smart-load with composite scoring (2× descriptor + 1× tool-distinct-sum + 2× tool-distinct-max + 4× full-id-hit) + 5-char-prefix relaxed match — "send a notification..." → other, "expand a query..." → power; all 8 originally-passing intents still route correctly. F15: verified metadata / tier / priority / tags already in the schema; pinned with regression tests. F16: opened agent_type schema to type: "string" with curated description (daemon was already permissive; schema-server mismatch closed in favor of the daemon).

commit 66f48ae · mcp.rs major edits · 25 new tests pass
Fix-Agent εF17 + F18 · KG + dedup polish

find_paths surface · check_duplicate exact-match short-circuit

F17: rewrote find_paths bail message to name the constant (FIND_PATHS_MAX_DEPTH) and the maintainer-escalation path; added a Directionality contract section to the doc comment (find_paths UNDIRECTED via UNION ALL, kg_query DIRECTED, both honor include_invalidated identically). F18: added canonical_content_hash (SHA-256 of UTF-8, no normalization, uses already-vendored sha2) and check_duplicate_with_text: two-phase — hash compare first → similarity=1.0, is_duplicate=true on match, else fall through to embedding cosine. Catches byte-identical duplicates that the embedding pipeline would otherwise cap at ~0.92 due to nomic prefix normalization. No schema migration required.

commit 082c999 · db.rs surgical · 7 new tests pass

Integrator stitches

All four CLAUDE.md gates green on the merged branch

✓ cargo fmt --check
✓ cargo clippy -p ai-memory --bin ai-memory --release -- -D warnings -D clippy::pedantic
✓ AI_MEMORY_NO_CONFIG=1 cargo test --release --no-fail-fast
   → 2 973 passed · 0 failed · 11 ignored · 81 test binaries / Doc-tests all green
✓ cargo build --release
   → target/release/ai-memory · 24.6 MB · loads cleanly

Round-3 verification gate (executed · all GREEN · see Round 3 section below)

The Round-2 session ran against the broken (pre-fix) daemon, so live re-verification was needed against the rebuilt binary. Round 3 ran the verification (3 parallel sub-agents + orchestrator-direct probes), surfaced two residual holes that the Round-2 commits left behind (F8 wired only into the banner, F12 keypair label mismatch), and landed the surgical fix in commit f02d092. Details below.

Round 3 · fresh-binary verification · GREEN

Round 3 — verified all 13 findings against the rebuilt binary; closed two residual holes (F8 + F12) and the F17 description gap.

Round 3 ran in a fresh Claude Code session with the rebuilt binary on disk. Pre-flight: ai-memory --version = 0.7.0; symlink resolved to /Users/fate/v07/v07-fixes/target/release/ai-memory (mtime 2026-05-07 21:31 EDT, 24,629,600 bytes); schema_version = 28. The orchestrator dispatched 3 parallel sub-agents — Agent X (HTTP + MCP surface), Agent Y (governance / CLI / identity), Agent Z (KG + dedup) — and ran orchestrator-direct probes against an ephemeral fresh-binary daemon on port 19078.

Critical procedural finding: the live HTTP daemon at port 19077 (PID 70896, started 2026-05-07 20:09:22 EDT) was started before the binary was rebuilt at 21:31:22, and macOS preserved the running process's mapped binary inode (21858535) even after the file on disk was overwritten with a new inode (22134163). Side-by-side probes confirmed: stale daemon returned HTTP 422 for missing-required (Round-2 F9 symptom); fresh daemon returned HTTP 400 + structured error. Same divergence on F7 quota wiring. Tag-gate operator action: kill PID 70896 with SIGINT (graceful WAL checkpoint), relaunch with the same args; readlink -f $(which ai-memory) + lsof -p <new-pid> should show inode 22291476 (the post-Round-3 commit f02d092 build).

Round 3 — per-finding disposition (orchestrator-direct, fresh binary)

F6PASS

spawn_blocking wrap holds. Code: daemon_runtime.rs:556-582 (tokio::task::spawn_blocking(move || mcp::run_mcp_server(...))). Runtime: memory_capabilities + memory_recall agree on recall_mode_active=hybrid. /usr/bin/sample 70896 4s → 0 clock_gettime hits, top stacks all idle (__psynch_cvwait 59980, kevent 5998). memory_expand_query at semantic tier returns clean structured "tier required" error rather than the Round-2 "Failed to send chat request".

F7PASS

HTTP path increments quota counters identically to MCP. Fresh daemon: 5 stores via POST /api/v1/memories from a new agent_id → memory_quota_status shows current_memories_today=5. Stale daemon (Agent X data): 25 stores from agent-x-quota:alpha-01 → 0 (the bug Agent X reported was on the wrong binary). Per-agent isolation holds.

F8PASS (after commit f02d092)

Round 3 surfaced a residual hole: the Round-2 fix added permissions::resolve_v07_default_mode + startup_banner_line + cli/serve_banner.rs and wired them into the banner, but the gate at db::enforce_governance reads config::active_permissions_mode() which is set in main() from app_config.effective_permissions_mode() — a function whose unconfigured fallback was still PermissionsMode::default() (= advisory). Banner said "permissions: enforce" via the resolve path; gate stayed advisory via the default path. Result: intruder writes still succeeded HTTP 201 against governance.write=owner namespaces. Fix in commit f02d092 (config.rs): rewrote effective_permissions_mode to delegate the unconfigured branch to resolve_v07_default_mode so every entry point shares the secure default. Re-verify: fresh daemon now returns permissions.mode=enforce in /api/v1/capabilities; intruder POST returns HTTP 403 + structured error "store denied by governance: caller '...:INTRUDER' is not the owner ('...:r3v-perm-owner')".

F9PASS

HTTP 400 with structured error replaces axum's 422. Fresh daemon empty body POST: HTTP=400 {"error":"missing required field: title","fields":["title"]}. Invalid tier: 400 with field-validation envelope. Stale daemon: both probes returned 422 — same divergence pattern as F7.

F10PASS

EmbedStatus surfaced + 64 KiB cap enforced. Boundary probes against fresh daemon: 1024 bytes → 201 (no embed_status, small-content path); 65535/65536 bytes → 201 + embed_status="skipped" + embed_status_reason; 65537+ bytes → 400 + {"error":"content exceeds max size of 65536 bytes"}. Clients can distinguish indexed-vs-skipped-vs-failed without scraping logs.

F11PASS

--confirm-global enforced for tier-only and pattern-only forget. ai-memory forget --pattern 'tmp-' without --namespace + without --confirm-global exits non-zero with the documented safety message. forget --tier short same behavior. Scope-bounded forget --namespace 'ai-memory/v0.7.0-nhi-round-3/agent-y-secure/forget-probe' succeeds and clears only the scratch namespace. Agent Y verified.

F12PASS (after commit f02d092)

Round 3 surfaced a residual hole: the Round-2 fix added EnsureOutcome + ensure_keypair in identity/keypair.rs and wired the call into serve(), but two stacked bugs left every new link with attest_level=unsigned: (1) load_active_keypair_for_serve resolved the per-process NHI default (host:<host>:pid-…-<uuid>) while ensure_keypair("daemon", …) wrote files under the well-known daemon label — the two paths looked at different filenames; (2) the auto-gen call ran AFTER AppState was built, so the active_keypair field was sealed None even when a key was generated seconds later. Fix in commit f02d092 (daemon_runtime.rs): replaced load_active_keypair_for_serve with ensure_and_load_daemon_keypair which calls ensure_keypair + load against the stable daemon label in the same step before AppState is constructed; carries the lifecycle outcome through ServeBootstrap so the F8/F12 banner still sees the auto-gen path. Re-verify: keypair lives at ~/Library/Application Support/ai-memory/keys/daemon.{pub,priv}; ai-memory identity list reports it; HTTP POST /api/v1/links returns {"attest_level":"self_signed","linked":true}; DB column signature length 64 (Ed25519). Idempotent: a daemon restart never overwrites an existing keypair.

F13PARTIAL · by design

v0.7 C4 token-budget trim — runtime works, default tools/list trimmed. Source declares all four optional properties (accept, family, include_schema, verbose) at mcp.rs:920-944. Runtime accepts and uses them (verified: accept=v1 returns schema_version=1; verbose=true grows payload from 8486 → 22775 chars). The default tools/list response trims optional properties to required + the C4 allow-list ["namespace","format"] to stay inside the C5 token budget; verbose=true restores. Strict clients can opt back in via memory_capabilities { family=<f>, include_schema=true, verbose=true }. Documented in memory_capabilities.docs at mcp.rs:919.

F14PARTIAL · 8/10 · no regression

Smart-load router scores 8/10 on Round-3 intent matrix — same quality as Round 2 (8/10 with different misses). Round-3 misses: "send a message to another agents inbox"other (should be powermemory_inbox family); "show me memory statistics"core (should be meta). Note: "send a notification…"other is now CORRECT (memory_notify lives in family other; my orchestrator-side intent map had the wrong expected family). Net: same routing quality; under-weights underscore tokens still flagged for v0.7.1.

F15PARTIAL · by design

Same C4 trim as F13. metadata is declared at mcp.rs:560 (memory_store) and mcp.rs:831 (memory_update); runtime round-trips correctly. Default tools/list trims it; verbose=true restores.

F16PARTIAL · forward-compat by design

agent_type schema is open type:string with curated description. Source comment at mcp.rs:1159-1170 explicitly justifies the open-form: the daemon's validate::validate_agent_type accepts the curated short-list (human, system, ai:claude-opus-4.6, ai:claude-opus-4.7, ai:codex-5.4, ai:grok-4.2) PLUS any ai:<name> form up to 64 chars — closed enum at the schema layer would lag the daemon's forward-compat surface. Documented behavior; not a bug.

F17PASS (after commit f02d092)

Server-side cap was already correct in Round 2; the Round-3 gap was purely in the brief description shown by default tools/list — it didn't mention the undirected semantics or the max_depth cap (both were in docs only, restored under verbose=true). Fix in commit f02d092 (mcp.rs:756): description now reads "Enumerate up to N paths through the KG between two memories. Undirected BFS with cycle detection; max_depth ceiling 7." Server-side cap re-verified: find_paths(max_depth=15) returns clean structured reject "max_depth=15 exceeds supported depth=7 (FIND_PATHS_MAX_DEPTH)". Directionality probe: undirected find_paths(Y, X) returns paths through an X→Y link; directed kg_query(Y) does not — both verified by Agent Z.

F18PASS

check_duplicate exact-match short-circuit working. Agent Z probes: byte-identical title+content+namespace → similarity=1.0, is_duplicate=true, suggested_merge populated; near-dup (single-comma) ~0.99; unrelated ~0.48. The canonical_content_hash SHA-256 short-circuit catches the embedding+normalization 0.92 cap that Round 2 saw on byte-identical strings.

Round-3 commit f02d092 — gates green

✓ cargo fmt --check
✓ cargo clippy -p ai-memory --bin ai-memory --release -- -D warnings -D clippy::pedantic
✓ AI_MEMORY_NO_CONFIG=1 cargo test --release --no-fail-fast
   → 2 973 passed · 0 failed · 11 ignored · 81 test binaries
✓ cargo build --release
   → target/release/ai-memory · 24 546 672 bytes · inode 22291476

Round-3 evidence memories

Operator action — restart the live HTTP daemon

Round 3 could verify all fixes via an ephemeral fresh-binary daemon on port 19078, but the live HTTP daemon at port 19077 (PID 70896, parent launchd) is still mapped to the pre-rebuild inode. Sandbox policy blocks the orchestrator from killing shared services; the operator runs:

# 1. graceful stop (triggers WAL checkpoint via the SIGINT handler in serve())
kill -INT 70896

# 2. wait for exit, then relaunch with the same args
until ! kill -0 70896 2>/dev/null; do sleep 1; done
nohup /opt/homebrew/bin/ai-memory serve \
  --host 127.0.0.1 --port 19077 \
  --db /Users/fate/.claude/ai-memory.db \
  >/tmp/ai-memory-serve.log 2>&1 &

# 3. verify the new PID maps to the post-Round-3 inode
NEWPID=$(pgrep -f 'ai-memory serve --host 127.0.0.1 --port 19077')
lsof -p $NEWPID | awk '/ai-memory$/ {print $9, "inode="$7}'
# expect inode 22291476 (the f02d092 build) instead of 21858535 (stale)

# 4. confirm the F8 + F12 fixes are live on the new daemon
curl -s http://127.0.0.1:19077/api/v1/capabilities | jq '.permissions.mode'
# expect "enforce"

Hard-rule compliance (Round 3)

Recall the Round-3 verdict

memory_recall context="v0.7.0 Round-3 verdict" namespace="ai-memory/v0.7.0-nhi-round-3"
Round 4 · live patched-daemon regression sweep · TAG-READY w/ follow-ups

Round 4 — 4 parallel sub-agents · Round-3 blockers all hold · 2 new yellow findings filed for v0.7.1.

Round 4 ran in a fresh Claude Code CLI session against the live patched HTTP daemon (PID 43617, port 19077, kernel-mapped binary inode 22303195 = post-Round-3 commit f02d092). Pre-flight independently re-verified the three Round-3 fixes on the live process: permissions.mode=enforce, fresh links land at attest_level=self_signed with 64-byte signature, and memory_find_paths returns the cap-aware description. The orchestrator dispatched 4 parallel sub-agents (A regression / B load / C security / D capabilities), each writing to its own sub-namespace under ai-memory/v0.7.0-nhi-round-4. Aggregate result: all Round-3 blockers (F8 enforce, F12 link signing structure, F17 description) hold; zero F1–F18 regressions; the permission gate exercised live (enforce: 2 → 1550 decision counter delta); two new yellow findings worth follow-up sub-issues but not technical blockers.

Verdict: TAG-READY-WITH-FOLLOW-UPS (yellow-green). Technical blockers from Round 3 are resolved on the live daemon. The two new yellow findings — F12 runtime adoption gap (signing structure works once but daemon was launched without AI_MEMORY_AGENT_ID=daemon so its synthesized per-PID NHI doesn't match the on-disk daemon.priv/.pub keypair; result: 0 / 128 in-DB links currently signed) and smart_load routing regression (4–5 / 10 vs Round 3's 8 / 10 — common verbs store/recall/verify/consolidate/promote collapse to archive via embedder fallback; the F8 fix notify → other is confirmed live) — are real but operator-decidable. Operator can ship + open follow-up sub-issues, or hold for one-line patches.

Round 4 — sub-agent results

Agent AF1–F18 regression · 13 PASS / 4 PARTIAL / 0 FAIL

Re-ran the F1–F18 probe sequence against the live daemon. Zero regressions vs Round 3; one improvement (F14 routing better on the sampled subset). Decision-counter delta enforce: 2 → 1550 proves the gate is live, not bypassed. Minor non-blocking observations: F7 webhook_url body field shape drift (SSRF guard still rejects loopback); F13 verbose=true / env / --profile full all NO-OP on tools/list (output byte-identical at 10766 B / 51 tools). Summary memory 53334972-53c2-40e9-a952-9633134df98e.

Agent Bload + concurrency · throughput +4.8% vs R2

5 × 100 fan-out: 500 / 500 OK in 11.51 s (43.4 stores/s aggregate, 0 errors). Per-agent quota tracking exact (current_memories_today=100 per agent_id). 10-way race on the same (title, namespace): 1 × 201 + 9 × 409 CONFLICT, no SQLITE_BUSY / 5xx leakage, all losers got existing_id. 1 000 sequential stores: 35.53 stores/s, p50 17.46 ms, p95 30.02 ms, p99 38.95 ms (one 2.2 s embedder pause did not cascade) — vs R2 baseline 33.9/s p99 27 ms: throughput +4.8%, p99 +44% (mild tail regression, non-blocking). Webhook DLQ ladder not actively exercised (SSRF guard requires [subscriptions] allow_loopback_webhooks=true opt-in, intentionally not modified per hard rules); existing DLQ row shape + memory_subscription_replay shape captured.

Agent Csecurity + governance · F12 runtime gap surfaced

SSRF guards: 11 / 11 reject (loopback / link-local / RFC1918 / file:// / plaintext non-loopback) with sanitized errors, 3-layer defense observed. Permission policy schema discovered as any | registered | owner (the playbook's none/team rules don't exist in this build); write=owner enforcement PASS. Inheritance PASS for parent-with-inherit=true gating deep child. Identity rotation refusal PASS via --no-overwrite; daemon key sha256 unchanged. F12 runtime gap: live DB shows 128 / 128 links attest_level='unsigned' with NULL signature; signed_events table empty. Daemon launched as ai-memory serve --host 127.0.0.1 --port 19077 --db ... without AI_MEMORY_AGENT_ID=daemon, so its synthesized NHI host:FROSTYi.local:pid-43617-<uuid8> doesn't match the only on-disk key (daemon.priv/daemon.pub). Pre-flight succeeded once, general traffic does not. Operator-actionable (relaunch with env var) AND code-actionable (signing path should fall back to the on-disk daemon key when the daemon has no explicit agent_id). Other minor: identity generate defaults are inverted (default = OVERWRITE, opt-out is --no-overwrite; no --force flag — UX foot-gun); macOS key dir is ~/Library/Application Support/ai-memory/keys/ (help text says ~/.config/ai-memory/keys). Summary memory 50ee2c13-38b5-47fd-aecc-a33d1f4cf3ba.

Agent Dcapabilities + tokens · smart_load regression

doctor --tokens --json: trimmed_full = 2 283 tokens (cap ≤ 3 500, 35% headroom), savings 74.5%, per-tool max ≈ 264 tokens. memory_capabilities 13-shape matrix: 4 / 4 invariants GREEN — v1 includes schema_version=1, verbose grows payload 11498 → 25923 B (2.25×), include_schema adds per-tool inputSchema (272 → 3676 B = 13.5×), all 8 family drills return tools[]. memory_load_family for all 8 families: PASS (51 tools total, matches doctor). smart_load regression: 4–5 / 10 routing accuracy vs Round 3's 8 / 10 — the F8 fix notify → other is confirmed live, but common verbs store/recall/verify/consolidate/promote collapse to archive via embedder fallback (all routes report source=embedder; keyword-rule taxonomy table appears not to be firing). Non-blocking but real UX regression. Summary memory 53013be5-ee06-4ed7-98b2-59813a6f67ba.

Round 4 — what's still GREEN (Round-3 blockers, re-verified live)

Round 4 — new yellow findings (not blocking; file as v0.7.1 sub-issues)

  1. F12 runtime adoption gap. Daemon process identity doesn't match on-disk daemon key, so general HTTP traffic produces unsigned links. Operator-actionable (relaunch with AI_MEMORY_AGENT_ID=daemon); code-actionable (signing path should fall back to the on-disk daemon key when the daemon has no explicit agent_id).
  2. smart_load routing regression. 4–5 / 10 vs Round 3's 8 / 10. Embedder fallback misroutes common verbs to archive; keyword-rule taxonomy table appears not to be firing.
  3. identity generate inverted defaults (UX foot-gun). Default action is OVERWRITE, opt-out is --no-overwrite; no --force flag. A typo can silently rotate the daemon key.
  4. F13 verbose-tools toggle NO-OP on tools/list (10 766 B regardless of verbose=true / env / --profile).
  5. macOS key-dir help-text mismatch (~/.config/ai-memory/keys vs actual ~/Library/Application Support/ai-memory/keys).
  6. Sequential-store p99 +44% vs R2 (38.95 ms vs 27 ms). No errors, no cascade — mild regression noted.

Round 4 — hard-rule compliance

Recall the Round-4 verdict

memory_recall context="v0.7.0 Round-4 verdict" namespace="ai-memory/v0.7.0-nhi-round-4"
Round 4 · fix campaign · commit 5b36d7c · all yellow findings closed

Round 4 fixes — operator authorized in-session resolution; no v0.7.1 wait, all four code-actionable items landed.

After the Round-4 verdict surfaced two yellow findings + four minor items, the operator authorized "fix it all now — we are not waiting for v0.7.1." The orchestrator empirically retested F12 against the live patched daemon, found Agent C's "0 / 128 unsigned" diagnosis was historical pre-restart data (a fresh HTTP-driven link signed cleanly: attest_level=self_signed, 64-byte signature, observed_by=daemon), and shipped four surgical fixes in commit 5b36d7c against the same round-2-fixes branch as PR #643.

Status: all four code-actionable Round-4 findings are CLOSED in commit 5b36d7c. F12 needed no code change (Agent C's diagnosis was data-history confusion; runtime signing was always working). The daemon has been relaunched on the new binary (PID 51722, inode 22320716, size 24 678 816, mtime 2026-05-08 11:01:37) with AI_MEMORY_AGENT_ID=daemon set explicitly for hardening. Round-5 verification recommended before tag-cut to confirm all fixes hold under the same 4-parallel-sub-agent regression sweep.

Round 4 — what shipped in commit 5b36d7c

smart_load veto5/5 R4 regression intents fixed · 4/4 F14 controls preserved

Problem: embedder cosine-sim over 80-word descriptors was noisy for short imperative intents; common verbs (store/recall/verify/consolidate/promote) collapsed to archive. Fix (src/mcp.rs, handle_smart_load): always run the deterministic keyword scorer first; when it produces a non-fallback signal AND disagrees with the embedder, trust the keyword path. The embedder still wins on ambiguous wording where the keyword path returns fallback. Live verification post-fix:

  • "store a new memory" → core (was archive)
  • "recall recent decisions" → core (was archive)
  • "verify a memory's signature" → graph (was archive)
  • "consolidate duplicate memories" → power (was archive)
  • "promote short to long tier" → lifecycle (was archive)

F14 control intents (notify→other, delete-and-forget→lifecycle, approve-pending→governance, restore-archived→archive): all four still route correctly.

identity generaterefuse-by-default · --force opt-in

Problem: default action was OVERWRITE; opt-out was --no-overwrite; no --force flag. A typo could silently rotate (and irrecoverably destroy) the daemon keypair. Fix (src/cli/identity.rs): refuse-by-default. Operator must pass --force to opt INTO rotation. Error message guides toward --force. Legacy --no-overwrite flag preserved as a hidden no-op for v0.7.0 pre-Round-4 script compatibility. New test generate_refuses_existing_without_force covers both the refusal path and the --force rotation path. Live smoke: first generate succeeds; second without --force errors with "pass --force to rotate; refused by default to prevent accidental key overwrite"; third with --force rotates cleanly.

F13 verbose envtools/list 10766 → 15695 B (+46%)

Problem: verbose=true / --profile full / AI_MEMORY_TOOLS_VERBOSE=1 all NO-OP — tool_definitions_for_profile trimmed unconditionally. Fix (src/mcp.rs): honor AI_MEMORY_TOOLS_VERBOSE=1 (or =true, case-insensitive) as a process-level escape hatch from the C4 optional-params trim. Cached via OnceLock so the hot tools/list path doesn't re-stat env on every call. Matches the existing AI_MEMORY_NO_CONFIG / AI_MEMORY_DB convention. Live verification: tools/list went 10 766 → 15 695 bytes under the env (+4 929 B / +46%); 51 tools either way (env is C4-only, doesn't add/remove tools).

key_dir help textplatform-aware doc

Problem: --key-dir doc said <config>/ai-memory/keys without telling the operator that on macOS this resolves to ~/Library/Application Support/ai-memory/keys/. Fix (src/cli/identity.rs): expand the doc to list all three OS defaults explicitly (Linux ~/.config, macOS ~/Library/Application Support, Windows %APPDATA%) and mention the AI_MEMORY_KEY_DIR env var.

Round 4 — what did NOT need a code fix

Round 4 — verification on relaunched daemon

New daemon: PID 51722, binary inode 22320716, size 24 678 816 B, mtime 2026-05-08 11:01:37. AI_MEMORY_AGENT_ID=daemon set in env. permissions.mode=enforce. Schema v28. CLAUDE.md gates re-verified post-fix:

Recall the Round-4 fix verdict

git log --oneline 5b36d7c -1
# 5b36d7c fix(mcp,cli,identity): Round-4 — smart_load keyword veto, identity refuse-by-default, tools_verbose env, key_dir help text
Round 1 · recommendation (superseded by Round 2 · HOLD-TAG)

Round 1: Ship v0.7.0 with release-notes additions.

The v0.7.0 surface is correct, well-bounded, sanitized, and observable across all 12 NHI test phases. Three P2 fixes plus the F4 opt-in helper landed on PR #636 against release/v0.7.0. Six P3 polish items are appropriate for v0.7.1.

Default-OFF posture for HMAC, audit, signing keys, and [[hooks]] means several #628 audit-blocker probes are untestable until operators opt in. The daemon's first-run UX tells operators exactly how to enable each feature; document this in release notes.

Round-2 update: the Round-1 recommendation above is the conclusion from the original 12-phase NHI playbook. The Round-2 multi-agent regression sweep (above) supersedes this with HOLD-TAG pending F6 / F7 / F8.
Round 1 · 12-phase summary

Round 1 — per-phase verdicts.

Each phase produced a single result memory in namespace ai-memory/v0.7.0-nhi-testing; counts below are PASS / partial / config-gap.

P0Environment & version handshake10 / 0 / 0PASS
P1Core CRUD smoke (5 tools)6 / 0 / 0PASS
P2Lifecycle (5 tools)10 / 0 / 0PASS
P3Knowledge graph (Track J)6 / 3 / 0PASS+
P4Governance & security hardening22 / 1 / 7PASS+
P5Power tools (autonomous tier)8 / 0 / 0PASS
P6Capabilities v3 + runtime expansion8 / 0 / 0PASS
P7Token-budget verification (Track C)7 / 0 / 0PASS
P8Hooks & integrations4 / 0 / 2PASS·gap
P9Cross-interface parity (CLI / HTTP / MCP)13 / 2 / 0PASS+
P10Performance & scale8 / 0 / 1PASS
P11Failure & chaos12 / 0 / 2PASS
Convention: partial = surface correct but config-gated or playbook-conditional. gap = feature wired but config-OFF on the test machine (HMAC, signing keys, [[hooks]]). Zero outright failures across all 12 phases.
Round 1 · findings & resolutions

Round 1 — six findings, four code paths touched, two test-bugs.

Each finding's resolution is linked to commits or memory-namespace entries. Findings F1 and F5 closed as test-bugs (the NHI test used the wrong header name). F2, F3, F4 helper landed on PR #636. F6-Round1 closed as already-documented. (Round-2 findings F6–F18 are listed in the Round-2 section above.)

F1closed · test-bug

HTTP X-AI-Memory-Agent-Id header silently ignored

P9 testing observed that even valid X-AI-Memory-Agent-Id header values were dropped to anonymous:req-…. Investigation confirmed the actual header name is X-Agent-Id (per src/identity/mod.rs::resolve_http_agent_id and CLAUDE.md §"Agent Identity (NHI)"). Re-test with the correct name: valid → 201 stamped, malformed → 400 with regex error, body field wins precedence per documented order.

Closed · erratum stored at memory id 3f12ea8a · no code change needed.

F2P2 · fixed

entity_register persists canonical_name as alias (NHI-P3-T2)

Registering an entity with no aliases left it unreachable: memory_entity_get_by_alias("<canonical_name>") returned found:false because the canonical name was never written into entity_aliases. Fix auto-inserts canonical_name as the first alias and de-duplicates against caller-supplied entries. Three test updates + one new regression test (entity_register_canonical_name_resolves_via_get_by_alias).

Fixed in PR #636 · src/db.rs::entity_register

F3P2 · fixed

kg_query / find_paths default to "current view" (NHI-P3-T7)

memory_kg_invalidate populated valid_until correctly, but the default kg_query / find_paths traversal still surfaced edges whose valid_until lay in the past. Added an include_invalidated: bool parameter (default false) that injects (valid_until IS NULL OR valid_until > now()) into the per-hop filter. memory_kg_timeline still returns the full history (its purpose). Both MCP and HTTP surfaces expose the toggle. Two regression tests.

Fixed in PR #636 · src/db.rs::kg_query / src/db.rs::find_paths

F4P2 · helper-only

Default namespace standard governance is write=any (NHI-P4-T19)

Setting a standard wires the rule but doesn't enforce. Added a documented opt-in helper GovernancePolicy::default_for_managed_namespace() returning {write:Owner, promote:Any, delete:Owner, approver:Human, inherit:true} for operators to paste into a standard memory's metadata.governance. Changing the implicit fallback in read_namespace_policy would break inheritance chains where parent and child standards were registered under distinct agent identities (the test_inherit_*_chain integration suite documents the constraint). The implicit-default change is deferred to v0.7.1 with migration notes.

Helper landed in PR #636 · implicit fallback change deferred to v0.7.1

F5closed · test-bug

HTTP doesn't 400 on malformed X-AI-Memory-Agent-Id header

Same root cause as F1 — the NHI test used the wrong header name. With the correct X-Agent-Id, the daemon returns HTTP 400 {"error":"invalid agent_id: agent_id contains invalid character ';' (allowed: alphanumeric, _-:@./)"}.

Closed · erratum at memory 3f12ea8a · no code change needed.

F6closed · already documented

verbose=true on memory_capabilities doesn't include per-tool schemas

The existing capabilities docstring already correctly documents that verbose=true only emits per-tool docs and full inputSchema when paired with family=<name> + include_schema=true. The NHI test was reading the surface incorrectly, not the surface being incorrect. No code change.

Closed · existing docstring at src/mcp.rs:919 covers the requirement.

P3 polishP3 · v0.7.1

Carry-over polish for v0.7.1

HNSW eventual-consistency lag observable on freshly-stored memories · potential_contradictions write-time hint threshold tightening · CLI link path produces unsigned edges (Track H signing requires ai-memory identity generate first — already shown in daemon log) · mid-tier TTL on first access shrinks 7d → 24h (a forcing function for auto-promote at 5 accesses; counterintuitive without docs) · memory_get_taxonomy returns JSON tree, not the ASCII tree the playbook anticipated · HTTP {memory, links} envelope vs MCP top-level shape diverge by one layer.

Documented in the verdict memory · scoped to v0.7.1 follow-up.

What worked great

Standout strengths.

A tight summary of the surface-level wins from the NHI run — every one verifiable from the per-phase memories.

Token economy

Trimmed profile 2 316 tokens (1 184 under the 3 500 ceiling). Verbose 9 008. Max single tool: memory_recall at 522 (35% of the 1 500 per-tool ceiling). 73.9% savings on a core-only profile.

Comprehensive SSRF surface

Loopback rejected with explicit opt-in, 169.254.0.0/16 link-local rejected on both HTTP and HTTPS, RFC1918 rejected, file:// scheme rejected, plaintext HTTP to non-loopback rejected. All errors sanitized with actionable opt-in flags.

Subscription retry / DLQ / replay

Real webhook delivery to example.com → 4 retries → DLQ entry with correlation_id → memory_subscription_replay returns the failed event with delivery_status. The v0.6.4 #615 dispatch_count semantic is observable end-to-end.

Distinct provenance per interface

agent_id prefixes (ai: MCP / host: CLI / anonymous:req- HTTP) plus source field (claude / cli / api) make audit trails unambiguous. One of v0.7.0's nicest observability wins.

agent_id immutability holds

No CLI flag exposes metadata.agent_id mutation. Explicit override attempts (--agent-id "evil") only set the requester, never the stored stamp. SQL-layer + caller-layer preservation per playbook spec.

Upsert by (title, namespace) under concurrency

5 racing writers all returned the same memory id; last-write-wins for content; no orphans, no duplicates. Stable IDs survive races (good for KG link integrity).

All three external models wired

nomic-embed-text-v1.5 (cosine in check_duplicate at 0.902 sim), ms-marco-MiniLM-L-6-v2 (rerank top-1 accuracy 100% on 8 concurrent topical queries), gemma4:e4b (expand_query, consolidate, auto_tag, detect_contradiction). Zero crashes.

Auto-promote at access_count=5

Mid-tier memory hit 5 accesses → silently promoted to long, expires_at cleared. Tier no-downgrade enforced (CLI --tier short on a long memory was refused).

Archive-before-delete on forget + GC

memory_forget returned {archived:true, deleted:N}; archive_stats grew correspondingly. Recoverable deletions via memory_archive_restore.

Daemon UX hygiene

TLS-not-enabled warning ("set --tls-cert + --tls-key + --mtls-allowlist"), key-dir-missing message ("run ai-memory identity generate ..."), and SSRF errors all tell the operator exactly how to fix the situation. Top-tier operator-facing diagnostics.

Phase-by-phase

Detailed counts per phase.

PhasePassPartialGapDisposition
P0 Environment1000clean v0.7.0, schema=28, 8/8 families, A1 wired
P1 Core CRUD600round-trip clean, agent_id immutability holds
P2 Lifecycle1000auto-promote @5, no-downgrade, archive-before-delete
P3 Knowledge graph630kg core works; entity-alias resolution + current-view filter fixed in PR #636
P4 Governance & sec2217SSRF guards excellent, retry/DLQ/replay clean, default-OFF posture
P5 Power tools800all 6 LLM/embedder tools work end-to-end at autonomous tier
P6 Capabilities v3800v1/v2/v3 versioning + family scoping + runtime expansion all wired
P7 Token budget700trimmed=2316/3500, no per-tool >1500, savings 73.9% on core
P8 Hooks402events fire, batched rerank clean, G3 + ExecExecutor config-gated
P9 Cross-interface1320MCP/CLI clean, HTTP X-Agent-Id works as documented
P10 Performance801within budgets, 0 evictions, 0 dim violations
P11 Chaos1202clean error envelopes, upsert correct, doctor INFO post-chaos
Total114612SHIP-WITH-NOTES
How this run was conducted

NHI testing methodology.

A "Non-Human Identity" test boots an AI agent into a fresh Claude Code CLI session against the actual v0.7.0 binary on a live MCP database — no mocks, no stubs. The session executes a 12-phase playbook persisted in the ai-memory/v0.7.0-nhi-testing namespace and writes one result memory per phase plus a final verdict.

Pre-flight

Phase progression

Result conventions

Each phase result memory carries: title NHI-P<N>-<short-name>, namespace ai-memory/v0.7.0-nhi-testing, tier long, priority 8 (9 if blocker found), tags ["v0.7.0", "nhi-testing", "phase-N", "<area>"]. The final verdict carries priority 10 and tags ["verdict", "ship-readiness", "summary"].

Recall the Round-1 verdict

memory_recall context="v0.7.0 verdict" namespace="ai-memory/v0.7.0-nhi-testing"

Round 2 · multi-agent regression sweep

Round 2 ran in a fresh CLI session against the post-NHI-fix binary (/Users/fate/v07/v07-fixes/target/release/ai-memory, post-fix branch HEAD 72a021f, contains F2 + F3 + F4-helper). The orchestrator dispatched 5 sub-agents in a single tool-call batch (Agent A — F2/F3/F4 verification + KG deep dive, B — power tools exhaustive, C — cross-interface + chaos, D — governance / security / observability, E — capabilities / token budget / smart_load), each writing into its own sub-namespace under ai-memory/v0.7.0-nhi-round-2/<agent-slug>. After all agents returned, the orchestrator recalled the 5 summary memories and composed the Round-2 verdict at priority 10 in ai-memory/v0.7.0-nhi-round-2. No code was modified, no PR was merged, no tag was cut.

memory_recall context="v0.7.0 Round-2 verdict" namespace="ai-memory/v0.7.0-nhi-round-2"
v0.7.0 charter recap

"Attested-cortex" — what shipped.

v0.7.0 closed 69 / 69 epic tasks across 11 tracks (A through K) plus 15 audit-blocker fixes from issue #628 (PRs #629 through #634, all merged 2026-05-07).

The NHI test verifies the released bits behave as advertised when an AI agent uses them through the documented surfaces.