v0.6.3 ships a bench tool with three operator-friendly flags: --baseline for regression detection, --history for JSONL accumulation, --update-performance-md for auto-spliced docs. CI runs the bench on every PR. Regressions >N% fail the gate. The numbers in PERFORMANCE.md are the public contract.
Every operation has a target p95 latency budget on a reference machine (Apple M2, 8GB free RAM, SQLite). PRs that regress beyond a configurable threshold fail the bench CI gate. v0.7's Apache AGE work will extend this with KG-query budgets; v0.8 adds task-queue p95.
Rows tagged [advisory] are published targets without a matching bench in src/bench.rs yet (Stream E follow-up — embedder-bound, background-job, or federation paths). Unmarked rows are exercised by ai-memory bench on every PR. PERFORMANCE.md is the canonical budget contract.
| Operation | Tier | p95 budget | Notes |
|---|---|---|---|
| memory_store | keyword | ≤ 5 ms | FTS5 only · single-row insert |
| memory_store | semantic | ≤ 25 ms | [advisory] + MiniLM embedding (384d) |
| memory_store | autonomous | ≤ 60 ms | [advisory] + nomic embedding (768d) |
| memory_get | any | ≤ 2 ms | [advisory] indexed PK lookup + access bump |
| memory_search | keyword | ≤ 8 ms | FTS5 query, top-20 |
| memory_recall | semantic | ≤ 35 ms | [advisory] FTS5 70% + HNSW 30% fusion |
| memory_recall | autonomous | ≤ 90 ms | [advisory] + cross-encoder rerank (top-100→top-10) |
| memory_link | any | ≤ 4 ms | [advisory] FK insert + idempotency check |
| memory_promote | any | ≤ 8 ms | [advisory] + governance verdict |
| memory_consolidate | smart | ≤ 1500 ms | [advisory] LLM-bound · Gemma 4 E2B |
| memory_kg_query | any | ≤ 50 ms | recursive CTE · depth 3 · < 1k edges |
| memory_get_taxonomy | any | ≤ 30 ms | [advisory] tree walk · default depth 8 · limit 1000 |
| memory_archive_purge | any | ≤ 200 ms | [advisory] per 1000 archived rows |
| sync_push (per row) | any | ≤ 15 ms | [advisory] peer-to-peer · TLS 1.3 |
| bulk_create | any | ≤ 2000 ms | [advisory] 100 rows + fanout · concurrent post W12 |
Reference machine: Apple M2, 16GB RAM, macOS 14, SQLite default settings, single-process daemon. Numbers degrade gracefully on lower-spec hardware; budgets are headroom-aware. Live numbers under PERFORMANCE.md.
jq, awk, or any chart tool reads the JSONL directly. Used to keep a rolling p95 trend that survives across releases.PERFORMANCE.md file, in-place between the marker comments. The next git commit captures the doc update. Operators run this after a known-good release to keep the public numbers fresh.Every PR runs the bench job (.github/workflows/bench.yml) on a fixed-spec runner. The job pulls the baseline from the previous release tag, runs the bench, compares. Regressions > threshold fail the job and block merge.
Every MCP tool dispatch is wrapped in a tracing::info_span!("mcp_tool_call") with attributes including tool, elapsed_ms, outcome. Operators can chart per-tool p95/p99 against the public budgets — and when a single call regresses, the trace tree shows which sub-operation took the time. v0.7's tracing layer extends this with hook-pipeline spans.
--features sal) past ~1M.