AI agents forget everything between sessions; vector databases store text but not meaning, identity, or governance. ai-memory is the substrate layer that gives autonomous AI agents persistent typed memory, a knowledge graph, and operator-signed governance rules — in a single Apache-2.0 Rust binary, deployable on a laptop or a fleet.
Off-the-shelf, an LLM call is stateless. Conversation history is reconstructed token by token on every turn, which is expensive, error-prone, and forgets again once the context window rolls over. Most teams reach for a vector database; that buys text similarity search but nothing else — no typed memory, no temporal validity, no signed audit trail, no operator policy, no identity for the AI itself.
For real autonomous AI Non-Human Identity (NHI) agents — agents that act on behalf of an org, with persistence across sessions, multiple subordinate agents, and policy that survives a model swap — you need a substrate, not a similarity index.
| Capability | Vector DB only | ai-memory |
|---|---|---|
| Semantic similarity search | Yes | Yes (hybrid: FTS5 + cosine) |
| Typed memory kinds | No | 10 governed kinds (Form-6 vocabulary) |
| Knowledge graph with temporal validity | No | Apache AGE, valid_from / valid_until |
| Ed25519-signed memory links | No | Per-link attest_level |
| Operator-signed substrate rules | No | L1–L6, key on disk |
| HMAC-required event subscriptions | No | SSRF gate by default |
| NHI agent_id semantics | No | Resolution ladder, preservation invariants |
| Autonomous tier (consolidate / contradict / auto-tag) | No | LLM-backed, vendor-agnostic — Ollama local OR 15+ cloud vendors (xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM) |
| Single binary, zero cloud dependency | Often hosted | Single Rust binary, sqlite default |
| Apache-2.0 | Mixed | Yes |
The comparison is not "ai-memory replaces your vector DB"; it is "ai-memory gives you the substrate that holds a vector DB if you want one, plus 9 things a vector DB cannot give you alone."
Apache-2.0. No copyleft. No CLA. Permits commercial embedding.
Storage layer is SQLite (the world's most-deployed database). Export to JSONL or PostgreSQL+AGE is first-class. No proprietary format.
Rust, cargo audit required to be clean as a release gate. Binary is statically linkable; reproducible builds are on the v1.0 roadmap.
v0.7.0 secure-default: permissions enforced by default, SSRF gate on webhooks, signed audit chain, HMAC subscriptions, optional sqlcipher for at-rest encryption.
One process, one DB file. Failure modes are SQLite failure modes (well-understood). Migrations dry-run-tested by the maintainer dogfood loop before every release tag.
v0.7.0 is the third major release. 74 MCP entries at --profile full, 89 HTTP routes (75 unique paths), 80 CLI subcommands (82 with --features sal or --features sal-postgres). CI gates on Linux x64/arm64, macOS x64/arm64, Windows x64, plus iOS + Android cross-compile (Posture 1a, #1068). Test campaigns are reproducible and publicly logged under docs/v0.7.0/test-campaign-*.
tracing output, append-only audit chain, capabilities endpoint for health checks.AI_MEMORY_LLM_BACKEND — Ollama for local/free, xAI Grok / OpenAI / Anthropic / Gemini / DeepSeek / Kimi / Qwen / Mistral / Groq / Together / Cerebras / OpenRouter / Fireworks / LMStudio / vLLM / llama.cpp-server for cloud or self-hosted GPU. GPU is optional — local Ollama needs one; cloud vendors shift inference to their side. Keyword and semantic tiers need no LLM at all.There is no SaaS billing surface in ai-memory itself. The only cost you incur is your own hosting plus (optionally) your own LLM call budget, paid directly to whichever vendor you choose.
v0.7.0 unlocks an unusually wide deployment surface because the LLM substrate (provider-agnostic, #1067) and the mobile cross-compile gates (#1068) decouple ai-memory from any single OS, vendor, or hardware floor.
| Posture | Where it runs | CPU / RAM / GPU floor | LLM source | Cost shape |
|---|---|---|---|---|
| 1a. Cellphone / tablet | iOS (arm64) + Android (arm64-v8a, armeabi-v7a, x86, x86_64) — in-app embed via FFI | 1 core / 256 MB / none | Cloud (xAI / OpenAI / Anthropic / Gemini) | Vendor-metered |
| 1b. Laptop / workstation | macOS arm64/x64, Linux arm64/x64, Windows x64 | 2 core / 4 GB / optional | Local Ollama or cloud | $0 local / vendor-metered cloud |
| 2. CPU-only cloud VPS | Any $5/mo VPS (Linode, Hetzner, DigitalOcean droplet) | 1 vCPU / 1 GB / none | Cloud LLM only (no local inference floor) | ~$5/mo host + LLM-metered |
| 3. CPU-only container (Plan C) | GHCR Docker image, K8s, ECS, Cloud Run | 1 vCPU / 512 MB / none | Cloud LLM (env-injected) | Container-metered + LLM-metered |
| 4. CPU-only sidecar (in-pod) | Sidecar to an existing app pod | 0.25 vCPU / 256 MB / none | Cloud LLM via vendor API | Negligible host + LLM-metered |
| 5. GPU workstation | Dev box with NVIDIA / Apple-Silicon NPU | 4 core / 8 GB / 8 GB VRAM | Local Ollama (gemma3:4b, llama3, qwen3) | $0 marginal once hardware owned |
| 6. GPU server (single-node) | Bare-metal or cloud GPU instance | 8 core / 32 GB / 24 GB VRAM | Local vLLM / Ollama / llama.cpp-server (OpenAI-compatible) | $0.5-3/hr GPU instance |
| 7. Private DC vLLM cluster | On-prem K8s + vLLM autoscaler | Cluster-scale | Self-hosted vLLM (OpenAI-compatible endpoint) | Capex + power; no per-token fees |
| 8. Multi-region federation (T4-T5) | Multi-region quorum sync, per-region LLM choice | 3+ nodes | Mixed: each region picks Ollama / cloud / vLLM independently | Region-aggregated |
| 9. Air-gapped / SCIF | No-internet enclave | Bring your own | Local Ollama or self-hosted vLLM only (no cloud egress) | $0 marginal post-deploy |
| 10. Edge / IoT | arm64 SBC (Raspberry Pi 5, Jetson Nano) | 2 core / 2 GB / optional NPU | Cloud LLM (default) or tiny local model | Hardware + LLM-metered |
Operator picks via the universal AI_MEMORY_LLM_BACKEND precedence ladder (CLI flag > env var > config.toml > compiled default). No code changes between postures — same Rust binary, same MCP / HTTP / CLI surfaces.
If your question is "can this run a secure, multi-region fleet of AI agents — teams, swarms, hives — under our compliance regime?", the answer is a public, reproducible artifact rather than a slide deck. The Grand Slam reference architecture is a 15-node, 3-region federated hive (do-1461) with W=2 quorum replication and three independently encrypted legs — each leg proven both positive (traffic flows when keys are right) and negative (traffic refused when they are not).
The same fleet was destroyed and rebuilt from nothing in two independent clean-room rounds; both rounds returned 119/119 verify checks green, and the round-2 fleet passed the 150/150 full-spectrum suite (regression, crypto, federation, zero-touch trust, A2A, AI-NHI, NSA-gap, curator groups). Identity is Ed25519 end to end; enrollment at fleet scale is CA-rooted Zero-Touch Trust; the security posture maps control-by-control onto NSA CSI MCP guidance with live-fleet test citations.
Per-agent signed events, programmable hooks, operator-signed substrate rules L1–L6, capabilities v3, real permission system. 6/6 PASS on the NHI Discovery Gate vs. live xAI Grok 4.3.
Cross-region replication discipline, signed-link verifiers, first-class Python and TypeScript SDKs, lock-contention work on the shared mutex.
Frozen MCP tool + HTTP API contract, reproducible-build verification, evidence-pack tooling for compliance auditors.
ai-memory is dogfooded by the maintainer's own multi-agent Claude Code workflow (the same workflow that builds every release). Every release/v0.7.x.y branch sits in real use for at least 24 hours against the operator's live MCP database before a tag cuts. Migration round-trips (now through schema v57) are tested against the operator's own DB on every commit. The project is also exercised by the IronClaw A2A 4-domain campaign on Docker against xAI Grok 4.3.
External adoption is intentionally early-and-honest; if you are evaluating for production, the recommended path is to dogfood it against a non-critical agent first and reach out via GitHub issues for any gaps.
Every release ships with a public test campaign directory: pinned binary SHA, every test in expected-vs-actual form, every issue closed with retest evidence. The release-gate-final campaign (2026-05-22) returned SHIP-RECOMMENDED on 7,321 PASS / 0 FAIL across 269 test binaries with 22 issues (#1120-#1141) fixed in-campaign — no deferrals to v0.8.0. The subsequent final-baseline regression (2026-05-31, off a pristine volume-wiped rig) ran both backends end to end: 15,951 PASS / 0 FAIL (sqlite 7,458 + Postgres/AGE 8,493), reproduced at 15,952 / 0 on an independent round-2 re-run — full provenance on the frozen-claims evidence page.