ai-memory  /  essays  /  brass tacks 02
Brass tacks · 02 of 03 · about a 6 minute read

How an AI NHI uses ai-memory.

A concrete walk-through of one session of an autonomous AI Non-Human Identity agent. Seven steps: boot, recall, store, link, consolidate, reflect, hand off. Real MCP tool calls. Terse output excerpts. No marketing.

The agent in this walk-through is a coding agent named ai:curator-agent@host01:pid-4815 running with the autonomous tier and Ollama-served gemma4:e4b. The user has asked it to investigate a flaky test in the project.

01 Boot — load identity and policy

The agent's launch hook calls memory_session_start. The substrate stamps a session id, surfaces the agent's prior reflections, and loads any operator-signed substrate rules that scope to this agent. The agent now knows who it is and what it is allowed to do.

Tool call
memory_session_start({})   // identity ai:curator-agent@host01:pid-4815
                           // resolved via the agent_id ladder (env / clientInfo)
Excerpt
{ "session_id": "sess-2026-05-18-a3f9",
  "prior_reflections": 3,
  "loaded_substrate_rules": ["L1.identity", "L3.audit", "L5.governance"],
  "last_handoff": "2026-05-17T22:14:03Z" }

02 Recall — what do I already know about this

Before doing anything, the agent calls memory_recall against the context "flaky test in pipeline." This is hybrid recall: FTS5 keyword score blended with cosine similarity over MiniLM embeddings, weighted by tier, priority, confidence, and recency. Every recall mutates the database (touches access_count, refreshes expires_at, promotes mid→long at 5 accesses).

Tool call
memory_recall({ "context": "flaky test pipeline", "limit": 5 })
Excerpt
{ "results": [
    { "id": "...a1f7", "title": "PR #810 introduced async test race",
      "kind": "observation", "confidence": 0.92, "score": 0.78 },
    { "id": "...b3c2", "title": "Test isolation policy — no shared DB",
      "kind": "decision", "confidence": 1.00, "score": 0.71 } ] }

03 Store — record the new observation

The agent reproduces the flake and learns it only fails when two suites run in parallel. It stores a finding. The substrate stamps agent_id from the resolution ladder (env, then process default), assigns a tier (mid: 7-day TTL), and writes a signed event into the audit chain.

Tool call
memory_store({
  "title": "Flake reproduces only under parallel suite execution",
  "content": "test_handler_parity::bucket_b flakes 30% with -j 2; never with -j 1",
  "tier": "mid",
  "kind": "observation",
  "confidence": 0.85,
  "tags": ["flake", "test-isolation", "pr-810"] })

04 Link — tie the new finding to what we knew

The new finding extends the earlier "PR #810 introduced async test race" memory. The agent records the relationship with a typed directional link and (because the autonomous tier is on) signs the link with the agent's Ed25519 key. valid_from is set; valid_until is left open until something invalidates the claim.

Tool call
memory_link({
  "source_id": "...new-finding-id",
  "target_id": "...a1f7",
  "relation":  "derived_from" })

05 Consolidate — do I have duplicates

The autonomous-tier curator (a background loop) periodically calls memory_consolidate on neighborhoods of the graph. It uses the configured LLM to detect that two earlier memories — one a tag drift report, one the new finding above — describe the same underlying defect. The curator merges them into a single canonical memory, preserves both source agent_ids under consolidated_from_agents, and emits a supersedes link from the old to the new.

Tool call
memory_consolidate({
  "ids": ["...old-id-1", "...old-id-2", "...new-finding-id"],
  "title": "PR #810 flake — parallel-execution race (consolidated)" })
Excerpt
{ "consolidated_id": "...c8d4",
  "consolidated_from_agents":
    ["ai:curator-agent@host01:pid-4815", "ai:earlier-agent@host01:pid-2200"],
  "supersedes": ["...old-id-1", "...old-id-2"],
  "verdict": "merged" }

06 Reflect — what does the chain now mean

After the consolidation, the agent calls memory_reflect against its working scope. Reflection is bounded (default depth cap 3 to prevent runaway recursion); the substrate refuses depth 4 with REFLECTION_DEPTH_EXCEEDED. The reflection is signed by the agent's key and stored as a memory of kind reflection, which the next session will surface via memory_session_start.

Tool call
memory_reflect({
  "source_ids": ["...a1f7", "...c8d4"],
  "title":   "pr-810 flake investigation",
  "content": "Flake is parallel-execution race; fix is per-suite DB.",
  "depth": 2 })
Excerpt
{ "reflection_id": "...refl-aa1",
  "summary": "Flake is parallel-execution race; fix is per-suite DB
              not shared mutex. Three earlier observations converge
              on the same root cause.",
  "next_actions": ["draft PR converting bucket_b to per-suite DB",
                   "extend test_handler_parity::race to assert -j 4 stable"] }

07 Handoff — the next session inherits

The session ends. Nothing is lost. The next session of any agent that boots into the same namespace gets the reflection surfaced via memory_session_start, the consolidated finding via memory_recall, and the typed link chain via memory_get_links. The signed audit chain replays cleanly; the substrate rules still apply; the agent_id trail records who did what; the operator can verify any of it.

Next session boot
memory_session_start({})   // booting as ai:next-agent@host01:pid-9001
  → surfaces reflection ...refl-aa1, points at consolidated ...c8d4

This is what "the substrate AI agents run on" means in concrete terms. Memory persists. Identity persists. Audit persists. Governance persists. The agent did its work; nothing of value vanished when the process died.


Two paragraphs to anchor what just happened.

First, none of these calls require the agent to be told what to do at each step. The session-start hook fires automatically; the curator runs on its own interval; the autonomous tier consolidates without being asked; reflection is part of the agent's own flow. The substrate is not a passive store the agent dips into; it is a partner that does work on the agent's behalf, on its own schedule, with operator-signed rules constraining the work.

Second, this entire sequence is reproducible. scripts/reproduce-recursive-learning.sh in the repo runs the end-to-end primitive with a fresh DB, three sample memories, and demonstrates the depth-cap refusal. The audit chain it produces verifies. That is the engineering bar: every claim made on this page is reachable from a fresh checkout.