ai-memory · Tracing — every log call

Setup

How tracing initializes.

The daemon installs a tracing_subscriber::fmt layer with the standard EnvFilter chain. Default directives keep ai_memory + tower_http at info while honoring any RUST_LOG override.

// src/main.rs::serve() tracing_subscriber::fmt() .with_env_filter( EnvFilter::from_default_env() .add_directive("ai_memory=info".parse()?) .add_directive("tower_http=info".parse()?), ) .init();

To turn the volume up:

# quiet — only errors RUST_LOG=ai_memory=error ai-memory serve # standard — what runs in prod RUST_LOG=ai_memory=info,tower_http=info ai-memory serve # debug a federation issue RUST_LOG=ai_memory::federation=debug,ai_memory=info ai-memory serve # everything (very chatty — local dev only) RUST_LOG=ai_memory=trace,tower_http=debug ai-memory serve

Where the calls live

11 modules, 176 call sites.

Counts pulled from grep -rn 'tracing::error!|warn!|info!|debug!' src/. Handlers and federation dominate — they are where user-visible behavior happens.

handlers.rs

Every HTTP/MCP write goes through one of these handlers. Errors get an error!("handler error: {e}") and the response code; governance verdicts get an error!("governance error: {e}").

federation.rs

Quorum fanout (W of N) is the noisiest module. Every peer success is silent; every peer failure logs warn!("federation: peer {peer_id} failed for {id}: {reason}") so the operator can see where the network dropped traffic.

main.rs

Bootstrap, config, listener init, signal handling. Mostly info! startup banners + warn-level config fallback notices.

mcp.rs

MCP protocol layer. Logs malformed JSON-RPC, unknown tool names, and method-not-found responses for protocol-level debugging.

db.rs

SQLite-specific issues: migration logs, FTS5 index rebuild events, vacuum completion. Most db::* calls are silent on success.

subscriptions.rs

Webhook dispatch. Logs SSRF guard rejections (W11 hardening), URL parse failures, and HTTP-client build/send errors so a misconfigured webhook doesn't disappear silently.

curator.rs

Cycle-driven curator (consolidation, taxonomy refresh). Logs when a cycle starts, ends, or skips because of a recent run.

llm.rs

Ollama bridge — logs info!("Pulling Ollama embedding model …") on first run and the success line when the pull completes.

store/postgres.rs

SAL Postgres adapter (--features sal). Logs adapter-bound issues distinct from SQLite.

reranker.rs

Cross-encoder reranker. Logs lock-poisoning and tokenizer-fallback events.

identity.rs

Agent-id resolution chain. Logs once when the precedence chain falls through to anonymous.

Level taxonomy

info / warn / error.

The daemon does not emit debug! or trace! at the call sites — those are reserved for ad-hoc instrumentation under RUST_LOG. Production output is the three levels below.

info

Useful, expected events.

Bootstrap milestones, model pulls completing, catchup cycle summary. Quiet by default — doesn't drown the operator.

Pulling Ollama embedding model 'nomic-embed-text'... Embedding model 'nomic-embed-text' pulled successfully catchup: pulled 142 memories from 3 peers in 2.1s

warn

Recoverable degradation.

Daemon kept running, request still succeeded (or fell back). Operator should look but no incident yet.

embedding generation failed: connection refused recall: embedder query failed, falling back to keyword-only: timeout SSRF guard rejected webhook URL https://10.0.0.1: private network federation: peer node-3 failed for abc-123: connect timeout register_agent fanout error (local committed): node-2 502

error

Request-level failure.

The HTTP/MCP response is also an error. Logs the same condition the client gets back so server-side reconstruction is possible from logs alone.

handler error: validation failed: title cannot be empty governance error: agent not registered execute pending error: target memory not found detect_contradictions list error: db locked export error: tar archive write failed

The canonical phrases

Grep recipes for incident review.

When an AI client is reading logs to figure out what happened, these are the lines to look for. Each pattern matches an exact tracing::*! call site in the source.

handler error

Any HTTP/MCP write that failed at the daemon

grep 'handler error:' daemon.log Followed by the underlying anyhow chain. Validators, db, governance, and federation all bubble into this single phrase. The HTTP response is 4xx/5xx; the log carries the full reason.

governance verdict

Deny or replay-failure on a governed action

grep 'governance error:' daemon.log A clean Pending verdict is silent — only Deny and replay failures log. Look for these when an AI client says "the write was rejected and I don't know why."

federation peer failure

A specific peer dropped out of a quorum

grep 'federation: peer .* failed' daemon.log Format: federation: peer {peer_id} failed for {memory_id}: {reason}. Quorum still completes if W of N succeed; this line tells you which N-W peers were missing.

post-quorum partial failure

Local commit landed but fanout to some peers errored out

grep 'fanout error (local committed)' daemon.log Used by register_agent, link, consolidate. The local row is durable; sync_pull from those peers will reconcile.

SSRF defense (W10/W11 hardening)

A webhook URL was rejected by the URL or DNS guard

grep 'SSRF guard rejected' daemon.log Format: SSRF guard rejected webhook URL {url}: {reason}. Reasons include "private network," "loopback resolved," "scheme not http/https." Two SSRF defects were closed in commit 9eeb453 for v0.6.3.

embedder fallback

Recall couldn't get a query embedding; degraded to FTS5-only

grep 'falling back to keyword-only' daemon.log Recall still returns results — the 70/30 hybrid becomes 100/0 keyword for that one request. If frequent, check Ollama health.

embedding write failure

A memory was stored but its vector wasn't indexed

grep 'failed to store embedding for' daemon.log The memory is persisted; FTS5 covers it. Semantic recall will miss it until the HNSW index rebuilds (automatic at the 200-entry overflow threshold, or on daemon restart).

model pull lifecycle

Ollama model fetch on first start

grep "Pulling Ollama" daemon.log grep "pulled successfully" daemon.log Pull is async and can take minutes on cold start. The "successfully" line is the OK signal — recall semantic ranking is live from that point on.

catchup cycle

Federation catchup summary at the end of a sync window

grep 'catchup:' daemon.log Per-peer pulls + the rollup. Includes parse-failure warnings (unparseable memory from peer) when a peer ships malformed data — those rows are skipped, not retried.

execute-pending replay

A pending action was approved and the replay path failed

grep 'execute pending error:' daemon.log Approval decision flipped status to approved but replay couldn't complete (deleted target, namespace gone, validator now rejects). Status remains approved; the row is loud in logs but the user-visible state is unchanged.

webhook delivery failure

A subscription's POST to its webhook URL failed

grep 'webhook POST to' daemon.log Format: webhook POST to {url} failed: {error}. Subscription-side; not retried at the daemon (the recipient is responsible for its own retry policy).

Future

v0.7 expansion.

v0.7 brings the Hook Pipeline (arch-spec §2) and Sidechain Transcripts (§6). Both extend the trace surface — every hook fire becomes a span, every transcript line is durable. Once those land:

Hook spans — info_span!("hook", event=..., subscriber=...) wrapping each hook callback so the trace tree shows which hook ran where

Transcript stream — every governed action lands a row in the transcript table with the full payload + verdict, queryable independent of the daemon log

OpenTelemetry hookup — v1.0 charter; the tracing-opentelemetry bridge so traces can ship to a collector

memory_capabilities introspection (v0.6.3.1 quick patch) — the daemon will report hooks.registered_count, transcripts.total_count, etc. so AI clients can answer "what's enforced here?" honestly

The trace, every line.