The trace, every line.

ai-memory uses the tracing crate with an EnvFilter-driven subscriber. 579 log call sites across 95 source files / 21 directories at v0.7.0 (was ~176 at v0.6.x; growth from the Wave-2 SAL split, the L1-5 Agent Skills surface, the K7 subscription-reliability drainer, the K11 governance-rules engine, the Form-4/5 provenance + confidence pipelines, and the recursive-learning + atomisation hooks). Every error path is named; every fanout failure is logged with the peer id; every governance verdict is loud. AI clients can grep the canonical lines below to reconstruct what the daemon was doing.

tracing 0.1 EnvFilter 579 call sites RUST_LOG-controlled
Setup

How tracing initializes.

The daemon installs a tracing_subscriber::fmt layer with the standard EnvFilter chain. Default directives keep ai_memory + tower_http at info while honoring any RUST_LOG override.

// src/daemon_runtime.rs::init_tracing() — called from serve() tracing_subscriber::fmt() .with_env_filter( EnvFilter::from_default_env() .add_directive(crate::logging::DEFAULT_LOG_DIRECTIVE.parse()?) // "ai_memory=info" .add_directive("tower_http=info".parse()?), ) .try_init();

To turn the volume up:

# quiet — only errors RUST_LOG=ai_memory=error ai-memory serve # standard — what runs in prod RUST_LOG=ai_memory=info,tower_http=info ai-memory serve # debug a federation issue RUST_LOG=ai_memory::federation=debug,ai_memory=info ai-memory serve # everything (very chatty — local dev only) RUST_LOG=ai_memory=trace,tower_http=debug ai-memory serve

Named metadata targets (post-#1562). Most events emit under the module-path default, so an ai_memory=… directive matches them (including the named ai_memory::-prefixed targets: ai_memory::subscriptions, ai_memory::authz, ai_memory::handlers::skills, ai_memory::federation::sync, ai_memory::federation::push_dlq, ai_memory::governance::audit). But a set of subsystems emit under literal targets that do not start with ai_memory — an ai_memory=debug filter does NOT match these events; name them explicitly:

# literal targets (RUST_LOG must name them — ai_memory=… won't match) store::postgres # postgres SAL adapter (#1562 — real metadata target) store::postgres::kg # postgres AGE knowledge-graph ops (#1562) schema_init # `ai-memory schema-init` CLI (#1562) signed_events # append-only audit chain transcripts.lifecycle # sidechain transcript storage hnsw.eviction # vector-index eviction/rebuild pre_store.auto_atomise # WT-1 auto-atomise hook (+ .sync sibling) federation::signing # outbound sync-body signing federation::attestation # inbound peer attestation governance # rule-engine verdicts # example: debug the postgres adapter specifically RUST_LOG=ai_memory=info,store::postgres=debug,store::postgres::kg=debug ai-memory serve
Where the calls live

21 directories, 579 call sites at v0.7.0.

Counts pulled from grep -rnE '^\s*(tracing::)?(error|warn|info|debug|trace)!\s*\(' src/. Handlers, federation, and the SAL adapter family dominate — they are where user-visible behavior happens.

163
src/handlers/

Every HTTP/MCP write goes through one of these handlers (modular split since v0.7 Wave-1: http.rs, federation_receive.rs, hook_subscribers.rs, transport.rs). Errors get an error!("handler error: {e}") and the response code; governance verdicts get an error!("governance error: {e}").

121
src/ (top-level)

Boot + MCP dispatch + LLM client + subscriptions + reranker + curator. Mostly info! startup banners + warn-level fallback notices + the Ollama / OpenAI-compatible client error envelopes.

74
src/federation/

Quorum fanout (W of N) is the noisiest area. Every peer success is silent; every peer failure logs warn!("federation: peer {peer_id} failed for {id}: {reason}") so the operator can see where the network dropped traffic.

60
src/store/ (SAL)

SAL adapter family — sqlite.rs + postgres.rs (Apache AGE Cypher). Logs adapter-bound issues, schema-migration ladder progress, and the postgres-route-gate verdicts on missing trait coverage.

40
src/storage/

SQLite SQL primitives + migrations.rs (v2→v57 ladder). Migration logs, FTS5 index rebuild events, vacuum completion. Most storage::* calls are silent on success.

37
src/mcp/tools/ + src/mcp/

MCP protocol layer + the 74-entry registry (15 calls under src/mcp/tools, 21 under src/mcp/tools/store alone). Logs malformed JSON-RPC, unknown tool names, method-not-found, and per-tool dispatch errors.

33
src/hooks/

Pre-store / post-reflect hook pipeline (14 calls under src/hooks/pre_store; 19 across the rest). Logs hook chain failures, retry decisions, and the v0.7.0 auto-export panic-recovery path.

16
src/governance/

K9 permissions gate + K11 substrate rules engine. Logs every Deny verdict, every fail-CLOSED rule-consultation error, and every signed-rule cache invalidation.

12
src/cli/

CLI subcommand handlers (80 top-level subcommands in the default build at v0.7.x post-#1598 reembed; 82 under --features sal). Logs config-migration outcomes, schema-init progress, doctor reachability probes.

6
src/transcripts/

Sidechain transcripts + I2 transcript-link substrate. Logs zstd-3 compression boundaries and the auto-extract policy decisions.

8
src/identity/

Agent-id resolution chain. Logs precedence-chain fall-throughs and the anonymous-fallback path.

6
src/curator/ + atomisation

WT-1 atomisation engine + curator cycle. Logs LlmCurator scaffolding and cycle-skip rationale.

3
src/offload/ + observations/ + notification/

QW-3 context-offload (TTL sweep), Form-5 shadow observations, K10 SSE approvals + inbox notifications.

Level taxonomy

info / warn / error.

42 of the 579 call sites are debug!/trace! diagnostics (federation sync, postgres adapter, hooks executor, migrations) — silent under the default ai_memory=info filter until RUST_LOG raises them. Default production output is the three levels below.

info

Useful, expected events.

Bootstrap milestones, model pulls completing, catchup cycle summary. Quiet by default — doesn't drown the operator.

Pulling Ollama embedding model 'nomic-embed-text'... Embedding model 'nomic-embed-text' pulled successfully catchup: pulled 142 memories from 3 peers in 2.1s
warn

Recoverable degradation.

Daemon kept running, request still succeeded (or fell back). Operator should look but no incident yet.

embedding generation failed: connection refused recall: embedder query failed, falling back to keyword-only: timeout SSRF guard rejected webhook URL https://10.0.0.1: private network federation: peer node-3 failed for abc-123: connect timeout register_agent fanout error (local committed): node-2 502
error

Request-level failure.

The HTTP/MCP response is also an error. Logs the same condition the client gets back so server-side reconstruction is possible from logs alone.

handler error: validation failed: title cannot be empty governance error: agent not registered execute pending error: target memory not found detect_contradictions list error: db locked export error: tar archive write failed
The canonical phrases

Grep recipes for incident review.

When an AI client is reading logs to figure out what happened, these are the lines to look for. Each pattern matches an exact tracing::*! call site in the source.

handler error
Any HTTP/MCP write that failed at the daemon
grep 'handler error:' daemon.log Followed by the underlying anyhow chain. Validators, db, governance, and federation all bubble into this single phrase. The HTTP response is 4xx/5xx; the log carries the full reason.
governance verdict
Deny or replay-failure on a governed action
grep 'governance error:' daemon.log A clean Pending verdict is silent — only Deny and replay failures log. Look for these when an AI client says "the write was rejected and I don't know why."
federation peer failure
A specific peer dropped out of a quorum
grep 'federation: peer .* failed' daemon.log Format: federation: peer {peer_id} failed for {memory_id}: {reason}. Quorum still completes if W of N succeed; this line tells you which N-W peers were missing.
post-quorum partial failure
Local commit landed but fanout to some peers errored out
grep 'fanout error (local committed)' daemon.log Used by register_agent, link, consolidate. The local row is durable; sync_pull from those peers will reconcile.
SSRF defense (W10/W11 hardening)
A webhook URL was rejected by the URL or DNS guard
grep 'SSRF guard rejected' daemon.log Format: SSRF guard rejected webhook URL {url}: {reason}. Reasons include "private network," "loopback resolved," "scheme not http/https." Two SSRF defects were closed in commit 9eeb453 for v0.6.3.
embedder fallback
Recall couldn't get a query embedding; degraded to FTS5-only
grep 'falling back to keyword-only' daemon.log Recall still returns results — the adaptive semantic/keyword blend becomes 100% keyword for that one request. If frequent, check Ollama health.
embedding write failure
A memory was stored but its vector wasn't indexed
grep 'failed to store embedding for' daemon.log The memory is persisted; FTS5 covers it. Semantic recall will miss it until the HNSW index rebuilds (automatic at the 200-entry overflow threshold, or on daemon restart).
model pull lifecycle
Ollama model fetch on first start
grep "Pulling Ollama" daemon.log grep "pulled successfully" daemon.log Pull is async and can take minutes on cold start. The "successfully" line is the OK signal — recall semantic ranking is live from that point on.
catchup cycle
Federation catchup summary at the end of a sync window
grep 'catchup:' daemon.log Per-peer pulls + the rollup. Includes parse-failure warnings (unparseable memory from peer) when a peer ships malformed data — those rows are skipped, not retried.
execute-pending replay
A pending action was approved and the replay path failed
grep 'execute pending error:' daemon.log Approval decision flipped status to approved but replay couldn't complete (deleted target, namespace gone, validator now rejects). Status remains approved; the row is loud in logs but the user-visible state is unchanged.
webhook delivery failure
A subscription's POST to its webhook URL failed
grep 'webhook POST to' daemon.log Format: webhook POST to {url} failed: {error}. Subscription-side; not retried at the daemon (the recipient is responsible for its own retry policy).
Landed + future

v0.7.0 expansion.

v0.7.0 shipped the Hook Pipeline (arch-spec §2) and Sidechain Transcripts (§6). Both extended the trace surface:

MCP dispatch spans (landed) — info_span!("mcp_tool_call", tool=..., rpc_id=...) wraps every MCP tool dispatch (src/mcp/mod.rs) so production observability can attribute latency per tool
Transcript stream (landed) — transcript lines land durably in the transcript tables (src/transcripts/, transcripts.lifecycle trace target), queryable independent of the daemon log
memory_capabilities introspection (landed) — the daemon reports hooks.registered_count, transcripts.total_count, etc. (src/mcp/tools/capabilities.rs) so AI clients can answer "what's enforced here?" honestly
OpenTelemetry hookup (future) — v1.0 charter; the tracing-opentelemetry bridge so traces can ship to a collector