Persistent memory
for any AI.
Hierarchical. Temporal. Sub-100ms.

A single Rust binary that gives Claude, ChatGPT, Cursor, Windsurf, Gemini, Hermes — every MCP-compatible AI — durable, shared memory across sessions, projects, and machines. Local-first. Zero cloud dependencies. Already running on your hardware in 60 seconds.

HIERARCHY tree of namespaces KNOWLEDGE GRAPH temporal-validity links PERFORMANCE CI-guarded budgets
Get Started in 60 Seconds View on GitHub →
26MCP tools
39HTTP endpoints
28CLI commands
5Platforms
13AI clients tested
0Cloud deps
30-Second Tour

One binary. Three pillars. Every AI.

ai-memory is a self-contained Rust daemon. Every AI tool you use plugs into it via MCP and gets the same memory. Stop losing context every conversation. Stop pasting "remember that I…" into every model. Stop paying SaaS fees for what runs locally on your laptop.

Pillar 1 · Hierarchy
Memories live in a tree, not a flat blob.

projects/alpha/decisions. clients/acme/contracts. research/quantum/papers. Recall scopes to a subtree, never bleeds across contexts. Your finance memories don't leak into your code reviews.

Pillar 2 · Knowledge Graph
Facts have a clock attached.

Every link between memories carries valid_from and valid_until. Ask "what was true on Feb 15?" and the system reconstructs the world as you knew it. Supersession is recorded, not destroyed.

Pillar 3 · Performance
Sub-100ms. Measured. Enforced.

Every hot path has a published p95 budget. CI fails any pull request that breaks them by more than 10%. memory_session_start < 100ms. memory_recall < 50ms. No silence-by-default.

The third pillar is the one most projects skip. mempalace publishes their budgets and hits them. Most memory systems don't. ai-memory does — every release ships a PERFORMANCE.md table and a CI gate that enforces it.
Architecture

Four layers. Clear boundaries.

Each layer has one job. The Surface layer talks to AI clients. The Core layer reasons about memory. The Safety layer enforces what's allowed. The State layer persists everything to disk. Cross-layer dependencies flow downward only.

Surface
MCP · HTTP · CLI · Webhooks
Three protocol fronts, one set of operations underneath. Stdio JSON-RPC for AI clients. mTLS-aware HTTP REST for services. Direct CLI for humans and scripts. Outbound webhooks for SIEM/audit.
Core
Recall · Hybrid Search · KG · Curator
FTS5 keyword + HNSW vector + cross-encoder rerank. Recursive-CTE knowledge graph traversal with temporal validity. Background curator daemon auto-tags, detects contradictions, consolidates similar memories.
Safety
Governance · SSRF Guard · Validation · Pending Actions
Allow / Deny / Pending decisions on every write. URL validator rejects RFC1918 and AWS metadata IP. Strict input validation on every public surface. Approve/reject queue for AI-proposed destructive ops.
State
SQLite · WAL · Federation Sync · Archive
Single-file SQLite database. WAL mode for concurrent reads. W-of-N quorum federation across machines. Soft-delete archive tier (recoverable). No proprietary file formats. cp memory.db is a backup.
What ai-memory is NOT: not an agent framework, not a multi-step planner, not a tool-using autonomous worker. It's a memory layer. The host AI does the reasoning. ai-memory just remembers — and forgets exactly when you tell it to.
Operational Tiers

Pick your power level.

Four tiers, each adding capability and dependency. Start at keyword for laptop-grade text search with zero install. Climb to autonomous for self-curating memory with neural reranking. Switch tiers per-deployment.

keyword
No dependencies · < 5 MB RAM
  • SQLite FTS5 + BM25 ranking
  • Hierarchical namespaces
  • Tag + tier filters
  • Knowledge graph traversal
  • Federation sync
semantic
+ Candle MiniLM (80 MB) · 200 MB RAM
  • Everything in keyword, plus:
  • Local nomic-embed-text vectors
  • HNSW approximate-nearest-neighbor
  • Hybrid FTS5 + vector recall
  • memory_check_duplicate
smart
+ Ollama (gemma4:e2b) · 4 GB RAM
  • Everything in semantic, plus:
  • Auto-tagging on store
  • Query expansion on recall
  • Contradiction detection
  • Curator daemon (manual mode)
autonomous
+ Cross-encoder reranker · 6 GB RAM
  • Everything in smart, plus:
  • Neural cross-encoder reranking
  • Curator daemon (auto mode)
  • Auto-consolidation
  • Self-organizing namespaces
Tier ≠ feature gate. Every tier ships every API surface. Higher tiers just turn on internal acceleration. The same memory_recall tool returns better results at higher tiers, but always returns results. You can demote at any time — your data is the same on disk.
Memory Lifecycle

How a memory is born, grows up, and is laid to rest.

Every memory follows the same path. The system honors what humans wrote, learns what AIs are doing, and never silently forgets anything important. Compaction is opt-in, archive is reversible, hard delete is your call.

memory_store CREATED tier=short access_count=0 memory_recall USED access_count++ extends TTL auto_promote PROMOTED tier: short → mid 5 accesses + recent memory_promote LONG-TERM tier=long manual or auto consolidate MERGED N → 1 + links curator or manual archive RECOVERABLE soft-deleted restorable on_contradiction FLAGGED webhook fires human picks winner memory_link SUPERSEDED valid_until set history preserved memory_forget USER-DELETED explicit op requires approval ─── HAPPY PATH ─── ─── CONTRADICTION / FORGET PATH ───
Reversibility is the rule, not the exception. Every destructive operation has an undo path. Archive instead of delete. Supersede instead of overwrite. The only hard delete is memory_archive_purge --older-than-days <N>, and that fires governance approval.
Pillar 3 · Performance

Every hot path has a published budget.

Numbers below are real measurements, not aspirational. ai-memory bench runs the canonical 1,000-memory workload and reports p50/p95/p99. CI fails any PR that exceeds budget by more than 10%. Hardware baseline: Apple M4 / 32 GB / NVMe SSD.

memory_session_start
target p95 < 100 ms · Claude Code hook
42 ms/ 100 ms
memory_recall (hot)
target p95 < 50 ms · agent reasoning
18 ms/ 50 ms
memory_store (no embedding)
target p95 < 20 ms · pure write
9 ms/ 20 ms
memory_store (with embedding)
target p95 < 200 ms · ONNX/Ollama call
86 ms/ 200 ms
memory_search (FTS5)
target p95 < 100 ms · keyword baseline
31 ms/ 100 ms
memory_check_duplicate
target p95 < 50 ms · pre-write check
22 ms/ 50 ms
memory_kg_query (depth ≤ 3)
target p95 < 100 ms · graph traversal
54 ms/ 100 ms
memory_kg_timeline
target p95 < 100 ms · ordered facts
38 ms/ 100 ms
curator cycle (1k memories)
target p95 < 60 s · background
12 s/ 60 s
federation ack (W=2 quorum)
target p99 < 2 s · multi-machine
850 ms/ 2 s
Every PR. Every push. Every release branch. The bench-CI workflow runs ai-memory bench on ubuntu-latest and posts a workflow summary with the table above. A regression of more than 10% on any p95 fails the build. There is no "we'll fix the latency later" path.
Test Quality

1,809 tests. Every module above 79%.

The v0.6.3 coverage campaign took ai-memory from 56.7% line coverage to 93.08% across 9 waves of parallel agent work. 26 closers shipped ~1,200 net new tests over the ~30K-line Rust codebase. Full report: CAMPAIGN-FINAL-METRICS.md

Codebase line coverage
93.05%
42,894 / 46,099 lines covered
Total tests passing
1,809
1,600 lib + 209 integration · 0 failed · 0 ignored · 4 platforms

Per-module coverage (sorted descending)

main.rs
100.00%
errors.rs
100.00%
color.rs
100.00%
mine.rs
99.29%
toon.rs
99.07%
replication.rs
98.80%
subscriptions.rs
97.61%
curator.rs
97.13%
autonomy.rs
96.80%
identity.rs
96.71%
config.rs
96.55%
validate.rs
96.52%
models.rs
95.64%
hnsw.rs
95.52%
tls.rs
94.85%
llm.rs
94.80%
db.rs
93.85%
daemon_runtime.rs
93.43%
handlers.rs
92.85%
federation.rs
92.63%
embeddings.rs
91.70%
mcp.rs
91.22%
reranker.rs
79.25%
Why reranker is the lowest. The neural BERT cross-encoder needs HF-Hub model weights, which CI doesn't pre-fetch by default. Heuristic rerank path is fully covered. Closing the gap to ≥92% is a v0.7 work item with a plan in the public assertions table.
Multi-machine Federation

Quorum-replicated. mTLS-secured. Eventually consistent.

Run ai-memory on N machines. Writes propagate as W-of-N quorum — by default 2 of 3. Reads stay local; writes acknowledge after quorum. Every peer authenticates via mTLS with fingerprint allowlist. Catchup loop closes partition windows automatically.

node-1 leader writer node-2 replica ACK ✓ node-3 replica ACK ✓ node-4 replica offline W=2 of N=3 reached → write committed node-4 catches up via /api/v1/sync/since when reachable
Transport
mTLS · client-cert pinning

Both sides verify each other's certificate. Fingerprint allowlist prevents accidental joins. No central PKI required.

Consistency
Eventual · Lamport-clock vector cursors

Per-peer sync-state cursor advances with successful pulls. Re-joining peer fast-catches-up to current epoch.

Failure
Quorum-not-met → 503 with peer status

Caller sees structured error: which peers responded, which timed out. No silent partial writes.

From One to a Million

Same binary, every scale.

ai-memory runs the same way on a developer's laptop as it does on a federation of state-government data centers. The differentiator is configuration, not code path: federation peers, mTLS allowlists, governance policies, autonomous-tier resources.

👤
SCALE 1 · INDIVIDUAL
Solo developer
1 user · 1 box
laptop, semantic tier
"I forget what I learned last week. Claude forgets between sessions."
One ai-memory mcp in the MCP config. Memory survives restarts, updates, machine swaps.
  • Cross-session context
  • Cross-AI parity (Claude, Cursor, Codex)
  • Local-first privacy
👥
SCALE 2 · STARTUP
Small team (2-25 people)
2-25 users · 1-3 nodes
shared engineering memory
"Tribal knowledge dies when someone leaves. New hires re-discover landmines."
Federated 3-node cluster. Per-namespace shared memory. ChatGPT memory but you own the data.
  • Onboarding accelerator
  • Decision audit trail
  • $0 per-seat cost
🏢
SCALE 3 · MID-MARKET
Mid-size org (25-500 people)
25-500 users · 5-15 nodes
multi-team, multi-namespace
"AI usage is exploding. So is the SaaS bill. So is the data-leakage risk."
Per-team namespace + governance. mTLS federation. SIEM webhook integration. Local data residency.
  • Centralized memory governance
  • Per-team compliance scope
  • SOC2-friendly audit trail
🏛️
SCALE 4 · ENTERPRISE
Large corporation (500-50k)
500-50,000 users · 20-100 nodes
multi-region federation
"Each business unit wants AI. Each lawyer wants air-gapped. Each auditor wants logs."
Per-BU federated cluster. Cross-region quorum. PII redaction hooks. Backup + restore + retention.
  • Air-gapped deployments
  • Per-BU policy enforcement
  • Hooks for data-loss prevention
🇺🇸
SCALE 5 · GOVERNMENT
Federal · state · local · municipal
Sovereign · Air-gapped
FedRAMP / IL5 / Sovereign
"AI is mandatory. The cloud is forbidden. The vendors are foreign."
Apache 2.0 OSS. Single Rust binary. Zero outbound calls. Hardware-attested keys (v0.7).
  • 100% air-gap operable
  • Public source-code audit
  • No vendor lock-in
The AgenticMem commercial tiers are an addition, not a requirement. Every audience above can run the OSS ai-memory binary indefinitely with no commercial product needed. The Attest / Federate / Sovereign commercial tiers add hardware-backed identity, managed federation, and FedRAMP-certified deployments — but only when the operator chooses.
Roadmap

From v0.6.3 today to v1.0 federation maturity.

Public release sequence. Each release ships one demoable headline plus operational substrate the next release builds on. No version skipping. No quiet feature drift. Public ROADMAP.md →

v0.6.3
Structured Memory + Performance · SHIPPED rc1
May 2026 · 4-week sprint
Hierarchy + Knowledge Graph + Performance Budgets. The grand-slam release.
  • Hierarchical namespace paths (projects/alpha/decisions)
  • Temporal-validity knowledge graph (recursive CTE today, AGE-ready for v0.7)
  • Published p95/p99 budgets + CI guard (bench.yml)
  • 93.84% test coverage (1,886 lib tests + 49+ integration, across the ubuntu/macos/windows CI matrix; v0.6.3.1)
  • Two SSRF defects discovered during campaign — fixed before tag
v0.8
Coordination Primitives
Q4 2026
Distributed task queue, typed cognition, CRDTs. The multi-agent substrate.
  • Task queue with attested claim/complete/abandon (replay-safe)
  • Typed cognition: Goal / Plan / Step / Observation / Decision relations
  • CRDT merge for concurrent multi-agent edits (G-Counter, OR-Set, LWW-Register)
  • Compaction pipeline with verify+rollback (typed-cognition supersession)
v0.9
Agentic Substrate
Q1 2027
Function calling, skill memories, streaming.
v1.0
Federation Maturity
Q2 2027
Auto-discovery (mDNS), end-to-end encryption, MVCC, OpenTelemetry standardization, strict semver discipline.
vs Alternatives

How ai-memory stacks up.

Honest comparison against the practical alternatives. Each has its place; ai-memory's place is "single binary, local-first, every AI, sub-100ms".

Capability ai-memory Vector DB
(Chroma, Qdrant, etc.)
SaaS memory
(ChatGPT memory, etc.)
mempalace Raw text
(notes, READMEs)
AI-agnostic (works with any MCP client)
Cross-session persistence
Hierarchical namespaces~
Temporal-validity knowledge graph~
Published latency budgets + CI guard
Hybrid recall (FTS + vector + reranker)~
Federation across machines~N/A
Local-first · zero cloud deps~
Single binary installN/AN/A
mTLS federationN/A
Self-curating background daemon
Apache 2.0 OSS · auditable source~N/A
Air-gap deployable~
Per-namespace governance
Webhook subscriptions for SIEM
Sub-100ms session-start budget✓ 42ms~~N/A
Where mempalace wins: longer track record, larger community, Python-native stack. Where ai-memory wins: single Rust binary (no Python install), AI-agnostic via MCP (works with Claude/Cursor/Windsurf/etc., not just one host), federation primitives (W-of-N quorum), Apache 2.0 commercial-friendly license. Both are correct choices for different operators.
Trust Ladder

Five steps from laptop to FedRAMP.

Each step strengthens the trust boundary without breaking the layer below it. The OSS binary is operable at every step. The AgenticMem commercial tiers add managed services on top of what's already shipped.

1
v0.6.3 · OSS
Local-first
SQLite WAL, bound to 127.0.0.1, no outbound traffic. Personal-machine baseline.
2
v0.6.3 · OSS
mTLS federation
Client-cert pinning, fingerprint allowlist, no central PKI. Multi-machine baseline.
4
AgenticMem · Attest
Hardware-backed keys
TPM / HSM / Secure Enclave for key storage. Compliance-ready Q3 2026.
5
AgenticMem · Sovereign
FedRAMP / IL5
Government-grade deployments. Air-gapped, audited, contractually committed.
By the Numbers

Everything quantifiable, in one place.

Every quantitative claim ai-memory makes, sourced from the post-v0.6.3 codebase and the public CAMPAIGN-FINAL-METRICS document.

Line coverage
93.05%
42,894 / 46,099 lines
Region coverage
93.11%
73,150 / 78,564 regions
Function coverage
92.55%
3,527 / 3,811 functions
Tests passing
1,809
1,600 lib + 209 integration · 0 failed · 0 ignored
MCP tools
26
Claude / Cursor / Codex / Continue
HTTP endpoints
39
REST + SSE + mTLS
CLI commands
28
store, recall, sync, ...
Session-start p95
42ms
budget < 100 ms
Recall p95 (hot)
18ms
budget < 50 ms
Federation ack p99
850ms
budget < 2 s · W=2
main.rs lines
75
down from 4,511 · 98.3% reduction
Modules at 100%
7
main, errors, color, lib, ...
Modules ≥ 90%
39
of 47 total
Distribution targets
5
macOS arm64/x64 · Linux arm64/x64 · Win x64
Package channels
5
Homebrew · APT · COPR · GHCR · crates.io
SSRF defects fixed
2
found + fixed in v0.6.3 campaign
Cloud dependencies
0
100% local-first
License
Apache 2.0
commercial-friendly OSS
Why This Matters

Memory is the missing layer in the AI stack.

For the individual

You have ~30 conversations a day with one or more AIs. Each starts cold. Each ends with knowledge that vanishes. Over a year that's roughly 11,000 lost contexts — a year's worth of relationship-building with the most powerful tool you've ever owned, evaporated every 4 hours.

ai-memory turns those 11,000 cold-starts into one continuous conversation that learns about you over time.

For the team

A 25-person engineering team using Claude collectively burns ~600 cold-start latencies per day. At 200ms each, that's 2 minutes/day of pure latency — but the bigger cost is the re-paste: explaining the same project context, repeatedly, to AIs that can't share what they learned.

A federated ai-memory cluster shares understood context across the team. New hires walk into the conversation already in progress.

For the enterprise

Every GenAI vendor wants your data. Every compliance officer wants it on your premise. Every architect wants it durable. Every CFO wants it predictable. The only stack that satisfies all four constraints is a local-first memory layer with a published latency contract — sitting underneath whatever AI vendor you happen to use today.

ai-memory is that layer. Apache 2.0. Single binary. mTLS federation. CI-guarded budgets. Auditable from git clone to deployed binary in 60 seconds.

For the public sector

AI mandates are real. Cloud bans are real. Foreign-vendor concerns are real. An OSS Rust binary that runs entirely on your hardware, requires zero outbound traffic, and ships with auditable source code is the only AI-memory primitive that works for federal, state, local, and municipal deployments.

v0.7's attested-identity work targets FIPS-grade key handling. v1.0's federation maturity work targets multi-region resilience. Today's v0.6.3 already runs air-gapped with no compromise.

Visualization Atlas

Twenty-three pages. Every facet, dedicated visualizations.

This page is the hub. Three concentric rings: six audience-facing pages (release spotlight, feature matrix, data flow, integrations, audiences, release pipeline), twelve feature deep-dives (tiers, rules, TTLs, archival, encryption, hierarchies, KG, autonomous, A2A, lifecycle, performance, credits), and five SME-detail references (schema, types, validators, governance, tracing). Pick what your audience needs.

Feature Deep-Dive Atlas

Eleven big-sell features, one page each.

The grand-slam features. Tiers + TTLs (the retention story). Rules (per-namespace policy stack). Archival (two-stage soft-then-hard delete). Encryption (SQLCipher, mTLS, HMAC). Hierarchies (8-level memory trees). Knowledge Graph (4 relations + temporal validity). Autonomous (Gemma 4-powered). A2A messaging. Full lifecycle. Performance + bench tool. Credits (Google Gemma 4, Nomic, Hugging Face, Ollama, SQLite, Rust ecosystem).

▸ THREE TIERS
Memory Tiers

Short (6h) / Mid (7d) / Long (no TTL). Mirrors human memory architecture. Promotion path with governance gates. The default that lets agents forget noise.

memory-tiers.html →
▸ FIVE LAYERS
Memory Rules

Validation → scope → governance → namespace standard → parent inheritance. Five rule layers, every refusal named with a reason. Multi-tenant isolation, compliance retention, AI-supervisor patterns.

memory-rules.html →
▸ EIGHT DIALS
TTL Controls

Per-write expires_at + ttl_secs. Per-tier defaults. Daemon-config overrides. Access-driven extension. archive_on_gc. Every dial that controls memory lifetime.

ttl-controls.html →
▸ SOFT THEN HARD
Archival

archive → restore or purge. Five archive MCP tools. archive_on_gc soft-delete. auto_purge retention windows. Compliance patterns for GDPR, retention SLAs, forensics.

archival.html →
▸ FOUR SURFACES
Encryption

SQLCipher AES-256 at-rest. mTLS + fingerprint allowlist for federation. HMAC-SHA256 webhooks. Signed git tags + SBOM. v0.7 Ed25519 attested identity roadmap.

encryption.html →
▸ MEMORY TREES
Hierarchies

8-level deep namespace paths. 5 visibility scopes (private/team/unit/org/collective). Namespace standards inheritance. memory_get_taxonomy tree walker. v0.6.3 Stream A.

hierarchies.html →
▸ STRUCTURED COGNITION
Knowledge Graph

4 relation types. Entity registry with alias resolution. Temporal validity columns. memory_kg_query / kg_timeline / kg_invalidate. v0.6.3 Streams B + C.

knowledge-graph.html →
▸ POWERED BY GEMMA 4
Autonomous

Auto-tag, consolidate, expand-query, contradiction detection, memory reflection, session-start. All powered by Google's open-source Gemma 4 via Ollama. Local-first.

autonomous.html →
▸ AGENT-TO-AGENT
A2A Messaging

memory_notify pushes to inbox (federation-aware). memory_subscribe webhooks fan out events. HMAC-SHA256 signed dispatch. Two patterns, one toolkit.

a2a-messaging.html →
▸ END TO END
Lifecycle

store → access → consolidate → promote → archive → restore or purge. Six stages, eleven transitions, every transition leaves an audit trail. Timeline visualization.

lifecycle.html →
▸ MEASURED + GATED
Performance

Public p95 budgets per operation. bench tool with --baseline / --history / --update-performance-md. CI bench gate that fails on regressions. v0.6.3 Streams E + F.

performance.html →
▸ THANKS
Credits

Open-source acknowledgements. Google for Gemma 4. Nomic AI for embeddings. Hugging Face for tokenizers + reranker. Ollama, SQLite, the Rust ecosystem.

credits.html →
SME Deep-Dive Atlas

Five reference pages. Every column, every type, every check.

When the audience-facing pages are not enough — when an evaluating engineer needs to see every SQL column, every Rust type, every validator, every governance verdict, every log line. These pages are the reference contract for AI clients integrating against ai-memory.

The atlas is print-friendly too. Every page in the visualization set has a @media print stylesheet that strips chrome, switches to white background, and applies page-break-protection. Print as a PDF for board decks, procurement reviews, or a "give this to your security team" packet.

Run it locally in 60 seconds.

No signup. No telemetry. No SaaS. brew install ai-memory or cargo install ai-memory — your laptop, your data, your AI.