ai-memory for decision makers — value, risk, cost, roadmap

The problem

AI agents have no memory and no identity.

Off-the-shelf, an LLM call is stateless. Conversation history is reconstructed token by token on every turn, which is expensive, error-prone, and forgets again once the context window rolls over. Most teams reach for a vector database; that buys text similarity search but nothing else — no typed memory, no temporal validity, no signed audit trail, no operator policy, no identity for the AI itself.

For real autonomous AI Non-Human Identity (NHI) agents — agents that act on behalf of an org, with persistence across sessions, multiple subordinate agents, and policy that survives a model swap — you need a substrate, not a similarity index.

The comparison

ai-memory vs. vector-DB-only.

Capability	Vector DB only	ai-memory
Semantic similarity search	Yes	Yes (hybrid: FTS5 + cosine)
Typed memory kinds	No	10 governed kinds (Form-6 vocabulary)
Knowledge graph with temporal validity	No	Apache AGE, `valid_from` / `valid_until`
Ed25519-signed memory links	No	Per-link attest_level
Operator-signed substrate rules	No	L1–L6, key on disk
HMAC-required event subscriptions	No	SSRF gate by default
NHI agent_id semantics	No	Resolution ladder, preservation invariants
Autonomous tier (consolidate / contradict / auto-tag)	No	LLM-backed, vendor-agnostic — Ollama local OR 15+ cloud vendors (xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM)
Single binary, zero cloud dependency	Often hosted	Single Rust binary, sqlite default
Apache-2.0	Mixed	Yes

The comparison is not "ai-memory replaces your vector DB"; it is "ai-memory gives you the substrate that holds a vector DB if you want one, plus 9 things a vector DB cannot give you alone."

Risk profile

What you are signing up for.

License risk

Apache-2.0. No copyleft. No CLA. Permits commercial embedding.

Vendor lock-in

Storage layer is SQLite (the world's most-deployed database). Export to JSONL or PostgreSQL+AGE is first-class. No proprietary format.

Supply chain

Rust, cargo audit required to be clean as a release gate. Binary is statically linkable; reproducible builds are on the v1.0 roadmap.

Security posture

v0.7.0 secure-default: permissions enforced by default, SSRF gate on webhooks, signed audit chain, HMAC subscriptions, optional sqlcipher for at-rest encryption.

Operational risk

One process, one DB file. Failure modes are SQLite failure modes (well-understood). Migrations dry-run-tested by the maintainer dogfood loop before every release tag.

Project maturity

v0.9.0 is the latest major release. 101 MCP entries at --profile full (100 callable), 92 HTTP route registrations (78 unique paths), 87 CLI subcommands (89 with --features sal or --features sal-postgres). CI gates on Linux x64/arm64, macOS x64/arm64, Windows x64, plus iOS + Android cross-compile (Posture 1a, #1068). Test campaigns are reproducible and publicly logged under docs/v0.7.0/test-campaign-*.

Total cost

What it actually costs to run.

License: $0. Apache-2.0, perpetual.
Hosting: $0 to whatever you spend on the host. Single binary. Runs on a developer laptop; runs on a $5/mo VPS; runs on a multi-region fleet. No mandatory cloud.
Infrastructure: SQLite or Postgres+AGE. SQLite is bundled. Postgres+AGE is optional for the T3+ graph backend.
Ops: one process to monitor. Standard tracing output, append-only audit chain, capabilities endpoint for health checks.
LLM cost: optional, vendor-agnostic, deployment-flexible. Post-#1067, every tier accepts any LLM provider via AI_MEMORY_LLM_BACKEND — Ollama for local/free, xAI Grok / OpenAI / Anthropic / Gemini / DeepSeek / Kimi / Qwen / Mistral / Groq / Together / Cerebras / OpenRouter / Fireworks / LMStudio / vLLM / llama.cpp-server for cloud or self-hosted GPU. GPU is optional — local Ollama needs one; cloud vendors shift inference to their side. Keyword and semantic tiers need no LLM at all.

There is no SaaS billing surface in ai-memory itself. The only cost you incur is your own hosting plus (optionally) your own LLM call budget, paid directly to whichever vendor you choose.

Deployment matrix

10 deployment postures — from cellphone to private DC.

v0.7.0 unlocks an unusually wide deployment surface because the LLM substrate (provider-agnostic, #1067) and the mobile cross-compile gates (#1068) decouple ai-memory from any single OS, vendor, or hardware floor.

Posture	Where it runs	CPU / RAM / GPU floor	LLM source	Cost shape
1a. Cellphone / tablet	iOS (arm64) + Android (arm64-v8a, armeabi-v7a, x86, x86_64) — in-app embed via FFI	1 core / 256 MB / none	Cloud (xAI / OpenAI / Anthropic / Gemini)	Vendor-metered
1b. Laptop / workstation	macOS arm64/x64, Linux arm64/x64, Windows x64	2 core / 4 GB / optional	Local Ollama or cloud	$0 local / vendor-metered cloud
2. CPU-only cloud VPS	Any $5/mo VPS (Linode, Hetzner, DigitalOcean droplet)	1 vCPU / 1 GB / none	Cloud LLM only (no local inference floor)	~$5/mo host + LLM-metered
3. CPU-only container (Plan C)	GHCR Docker image, K8s, ECS, Cloud Run	1 vCPU / 512 MB / none	Cloud LLM (env-injected)	Container-metered + LLM-metered
4. CPU-only sidecar (in-pod)	Sidecar to an existing app pod	0.25 vCPU / 256 MB / none	Cloud LLM via vendor API	Negligible host + LLM-metered
5. GPU workstation	Dev box with NVIDIA / Apple-Silicon NPU	4 core / 8 GB / 8 GB VRAM	Local Ollama (gemma3:4b, llama3, qwen3)	$0 marginal once hardware owned
6. GPU server (single-node)	Bare-metal or cloud GPU instance	8 core / 32 GB / 24 GB VRAM	Local vLLM / Ollama / llama.cpp-server (OpenAI-compatible)	$0.5-3/hr GPU instance
7. Private DC vLLM cluster	On-prem K8s + vLLM autoscaler	Cluster-scale	Self-hosted vLLM (OpenAI-compatible endpoint)	Capex + power; no per-token fees
8. Multi-region federation (T4-T5)	Multi-region quorum sync, per-region LLM choice	3+ nodes	Mixed: each region picks Ollama / cloud / vLLM independently	Region-aggregated
9. Air-gapped / SCIF	No-internet enclave	Bring your own	Local Ollama or self-hosted vLLM only (no cloud egress)	$0 marginal post-deploy
10. Edge / IoT	arm64 SBC (Raspberry Pi 5, Jetson Nano)	2 core / 2 GB / optional NPU	Cloud LLM (default) or tiny local model	Hardware + LLM-metered

Operator picks via the universal AI_MEMORY_LLM_BACKEND precedence ladder (CLI flag > env var > config.toml > compiled default). No code changes between postures — same Rust binary, same MCP / HTTP / CLI surfaces.

Enterprise & federal architecture

The federated story is proven live, not promised.

If your question is "can this run a secure, multi-region fleet of AI agents — teams, swarms, hives — under our compliance regime?", the answer is a public, reproducible artifact rather than a slide deck. The Grand Slam reference architecture is a 15-node, 3-region federated hive (do-1461) with W=2 quorum replication and three independently encrypted legs — each leg proven both positive (traffic flows when keys are right) and negative (traffic refused when they are not).

The same fleet was destroyed and rebuilt from nothing in two independent clean-room rounds; both rounds returned 119/119 verify checks green, and the round-2 fleet passed the 150/150 full-spectrum suite (regression, crypto, federation, zero-touch trust, A2A, AI-NHI, NSA-gap, curator groups). Identity is Ed25519 end to end; enrollment at fleet scale is CA-rooted Zero-Touch Trust; the security posture maps control-by-control onto NSA CSI MCP guidance with live-fleet test citations.

Grand Slam reference architecture15 nodes · 3 regions · W=2 quorum · proven live (do-1461) NSA control-by-control test matrixEach control → a live-hive test artifact Zero-Touch TrustCA-rooted federation identity · O(1) enrollment Reproducible baselinesTwo-round clean-room fleet reproduction · 119/119 both rounds Encryption at restsqlcipher build, passphrase-file discipline Architectures T1→T5Laptop → team → enterprise → region → global hive

Roadmap

Where this is going.

v0.7.0 — attested-cortex (shipped, GA)

Ed25519 attestation chain + 25-event hook pipeline + Apache AGE acceleration.

Per-agent signed events, programmable hooks, operator-signed substrate rules L1–L6, capabilities v3, real permission system. 6/6 PASS on the NHI Discovery Gate vs. live xAI Grok 4.3.

v0.8.0 — distributed coordination (shipped, GA)

Distributed coordination substrate + typed cognition + federation hardening.

Actions, leases, signals, attested checkpoints, and scheduled routines (#1709); typed-cognition lifecycle state; federation hardening — secure-default peer enrollment (#1789), signed action-transition replication (#1718), and per-write content attestation (#1464).

v1.0 — targeted

Wire-format stability, reproducible builds, SOC 2-friendly audit packaging.

Frozen MCP tool + HTTP API contract, reproducible-build verification, evidence-pack tooling for compliance auditors.

Who is using it

The dogfooders.

ai-memory is dogfooded by the maintainer's own multi-agent Claude Code workflow (the same workflow that builds every release). Every release/v0.8.x.y branch sits in real use for at least 24 hours against the operator's live MCP database before a tag cuts. Migration round-trips (now through schema v78) are tested against the operator's own DB on every commit. The project is also exercised by the IronClaw A2A 4-domain campaign on Docker against xAI Grok 4.3.

External adoption is intentionally early-and-honest; if you are evaluating for production, the recommended path is to dogfood it against a non-critical agent first and reach out via GitHub issues for any gaps.

Test results

The evidence behind SHIP.

Every release ships with a public test campaign directory: pinned binary SHA, every test in expected-vs-actual form, every issue closed with retest evidence. The release-gate-final campaign (2026-05-22) returned SHIP-RECOMMENDED on 7,321 PASS / 0 FAIL across 269 test binaries with 22 issues (#1120-#1141) fixed in-campaign — no deferrals to v0.8.0. The subsequent final-baseline regression (2026-05-31, off a pristine volume-wiped rig) ran both backends end to end: 15,951 PASS / 0 FAIL (sqlite 7,458 + Postgres/AGE 8,493), reproduced at 15,952 / 0 on an independent round-2 re-run — full provenance on the frozen-claims evidence page.

Latest campaign index2026-05-22 · SHIP-RECOMMENDED · tip fd172f2cf For decision-makersVerdict, risk profile, cost, vector-DB comparison, scope honesty For non-technical readersPlain-English version — what it does, what it means For SME engineersFull reproducibility + per-issue root-cause table

Evaluate ai-memory.