ai-memory ships as a single static Rust binary with four operational modes. This page is the operator's path: install, configure, deploy, observe, harden, upgrade. Every claim links to the reference doc that backs it.
Pick the path that matches how your fleet is already managed. All three deliver the same binary.
brew install alphaonedev/tap/ai-memory
sudo dnf copr enable alpha-one-ai/ai-memory
sudo dnf install ai-memory
cargo install ai-memory
Full matrix (Windows, .deb manual install, Docker, source) is in INSTALL.md.
Build + test gates run on:
aarch64-apple-ios + simulator slices via ai-memory-ios.xcframework.tar.gz)jniLibs/<abi>/ layout via ai-memory-android.tar.gz)Mobile coverage is layered: every PR + push exercises cargo check --target {aarch64-apple-ios,aarch64-linux-android} (~80% bit-rot coverage), release/** push triggers iOS Simulator + Android emulator runtime tests on the scoped ~50-test mobile subset (FTS5+WAL, HNSW CPU recall, embedder CPU path, LLM TLS handshake). Details: release-notes.md + tests/mobile/README.md in the repo.
Post-#1067 (provider-agnostic LLM substrate) + post-#1068 (mobile cross-compile gates), the same Rust binary slots into ten distinct deployment postures. No code branches between them — just env-var configuration of AI_MEMORY_LLM_BACKEND, AI_MEMORY_LLM_BASE_URL, AI_MEMORY_LLM_MODEL, AI_MEMORY_LLM_API_KEY.
| # | Posture | CPU / RAM / GPU floor | LLM source | Notes |
|---|---|---|---|---|
| 1a | Cellphone / tablet | 1 core / 256 MB / none | Cloud (xAI, OpenAI, Anthropic, Gemini) | iOS arm64 + Android arm64 / armv7 / x86 / x86_64; in-app FFI embed (#1068) |
| 1b | Laptop / workstation | 2 core / 4 GB / optional | Ollama local OR cloud | macOS arm64/x64, Linux arm64/x64, Win x64 |
| 2 | CPU-only cloud VPS ($5/mo) | 1 vCPU / 1 GB / none | Cloud only | Linode / Hetzner / DO droplet |
| 3 | CPU-only container (Plan C) | 1 vCPU / 512 MB / none | Cloud (env-injected) | GHCR Docker, K8s, ECS, Cloud Run |
| 4 | CPU-only sidecar | 0.25 vCPU / 256 MB / none | Cloud | In-pod sidecar to existing app |
| 5 | GPU workstation | 4 core / 8 GB / 8 GB VRAM | Local Ollama / llama.cpp | Dev box NVIDIA or Apple-Silicon NPU |
| 6 | GPU server (single-node) | 8 core / 32 GB / 24 GB VRAM | Local vLLM / Ollama | OpenAI-compatible endpoint inside the box |
| 7 | Private DC vLLM cluster | Cluster-scale | Self-hosted vLLM (OAI-compat) | On-prem K8s + vLLM autoscaler |
| 8 | Multi-region federation (T4-T5) | 3+ nodes | Mixed per-region | Quorum sync + per-region LLM choice |
| 9 | Air-gapped / SCIF | Bring your own | Ollama / vLLM ONLY (no cloud egress) | No outbound internet permitted |
| 10 | Edge / IoT (arm64 SBC) | 2 core / 2 GB / optional NPU | Cloud or tiny local | Raspberry Pi 5, Jetson Nano |
AI_MEMORY_LLM_BACKEND=openai-compatible + AI_MEMORY_LLM_BASE_URL=https://vllm.internal/v1. Same binary, same data, same MCP / HTTP / CLI surfaces. Precedence ladder: CLI flag > AI_MEMORY_* env var > config.toml > compiled default.Default config path: ~/.config/ai-memory/config.toml on Linux/macOS (resolved per service user; there is no --config flag). Skip loading entirely via AI_MEMORY_NO_CONFIG=1 (tests). The full ladder is CLI flag > AI_MEMORY_* env var > config.toml field > compiled default.
# config.toml — minimal production
db = "/var/lib/ai-memory/ai-memory.db"
tier = "semantic" # keyword | semantic | smart | autonomous
api_key = "<random-high-entropy-key>" # daemon API key; Plan C injects it
# at boot from AI_MEMORY_API_KEY
[identity]
anonymize_default = false
[permissions]
mode = "enforce" # v0.7.0 secure default
[limits] # operator-tunable resource caps (#1156 follow-up)
max_memories_per_day = 1000 # per-agent daily memory-write quota
max_storage_bytes = 104857600 # per-agent storage cap (bytes; 100 MiB)
max_links_per_day = 5000 # per-agent daily link-write quota
max_page_size = 1000 # list/bulk/sync page-size cap (OOM guard)
The [limits] section makes both the per-agent daily write quota and the list/bulk/sync page-size cap operator-tunable without recompiling. Each field resolves AI_MEMORY_MAX_* env > [limits] > compiled default; a non-positive or unparseable value falls through to the default (so a stray 0 can never clamp every list response to empty). The three quota fields seed fresh agent_quotas rows; max_page_size bounds per-request in-memory materialization as an OOM guard.
Bind address, port, and TLS are serve flags, not config fields: ai-memory serve --host 127.0.0.1 --port 9077 --tls-cert /etc/ai-memory/tls.crt --tls-key /etc/ai-memory/tls.key (add --mtls-allowlist for fingerprint-pinned client certs). When api_key is set, clients authenticate with the x-api-key request header — the canonical surface; the legacy ?api_key= query parameter is deprecated (it leaks into access logs and proxies; see #1574).
Every knob — including the full environment-variable table, secret-vs-config classification, and precedence regression tests — is enumerated in ADMIN_GUIDE.md and CONFIG_SCHEMA.md.
ai-memory mcp — long-lived stdio subprocess, launched by the AI client. Best for single-developer local use.ai-memory serve — Axum REST on port 9077, 89 route registrations / 75 unique URL paths. Best for shared org deployments with mTLS or API-key gating.ai-memory curator — long-running consolidation + reflection loop. Provider-agnostic (#1067): backs onto Ollama (local) or any OpenAI-compatible vendor via AI_MEMORY_LLM_BACKEND — xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM, llama.cpp-server. Best as a sidecar to the HTTP daemon.ai-memory sync-daemon — signed-event replication between peer nodes. Best for multi-region hive (T3+) deployments.The same binary covers all four, distinguished only by the subcommand. Architectures T1 (single agent) through T5 (global hive) are walked through in architectures.html.
[Unit]
Description=ai-memory HTTP daemon
After=network-online.target
[Service]
Type=simple
User=ai-memory
Group=ai-memory
ExecStart=/usr/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=2s
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Three observability surfaces, all on by default:
tracing-subscriber via RUST_LOG. JSON output supported.ai-memory audit verify.GET /api/v1/capabilities returns pre-computed calibration strings for monitoring tooling. Never includes secrets.Full surface in telemetry.md and signed-events-v4.md.
permissions.mode = "enforce" from v0.7.0 onward. Advisory/off requires explicit opt-out.AI_MEMORY_ALLOW_LOOPBACK_WEBHOOKS defaults false; loopback URLs rejected.AI_MEMORY_OPERATOR_PUBKEY; treat that env var as override-authority.AI_MEMORY_API_KEY from env and writes the config at boot; never logged.--db-passphrase-file (also mode 0400). See encryption.html.AI_MEMORY_OPERATOR_PUBKEY controls rule signing. Lock down host access; keep the operator key offline when possible.Full hardening checklist: SECURITY.md.
Plan C is the production container recipe. Image boots, reads AI_MEMORY_API_KEY + AI_MEMORY_DB from env, renders config.toml, and exec's the daemon. No baked-in secrets.
docker run -d \
--name ai-memory \
-p 9077:9077 \
-e AI_MEMORY_API_KEY=$(cat /run/secrets/ai-memory-key) \
-e AI_MEMORY_DB=/data/ai-memory.db \
-v /var/lib/ai-memory:/data \
ghcr.io/alphaonedev/ai-memory:0.7.0
Per-issue context (#845 hardened entrypoint, never-leak invariant): integrations/README.md.
The SQLite store carries an integer schema version (currently v57 at v0.7.0; was v33 at v0.6.4). v55 made the list_memories_updated_since federation-catchup query sargable and added idx_memories_updated_at per #1476 so peer catch-up scans get a true index range instead of a seq scan. v54 backfilled tier-default expires_at onto legacy NULL-expiry mid/short rows per #1466, closing the TTL-leak immortal-rows class. v53 scoped the memories_au FTS5 sync trigger to (title, content, tags) only per #1418 so non-FTS column updates no longer fire a needless sync. v52 added the transcript_line_dedup table per #1389 L4, backing the sha256-keyed idempotency layer for turn-capture. v51 added the federation_nonces table per #1255 / PR #1296 so peer-replay-prevention nonces persist across daemon restarts. v50 extended agent_quotas PRIMARY KEY from (agent_id) to (agent_id, namespace) per #1156 so per-namespace K8 quota allotments hold even when a single agent operates across many namespaces (pre-v50 rows backfill to the _global sentinel namespace). Migrations run on first open of the new binary; the migration set is dry-run-tested against the operator's own DB by the project's dogfood script before a release ships.
cp ai-memory.db ai-memory.db.bak. The binary additionally takes its own automatic pre-migration snapshot (#1576): before any schema-mutating migration it writes a transactionally-consistent VACUUM INTO sibling named <db>.pre-migration-v<FROM>-to-v<TO>-<token>.bak, and refuses to mutate the schema if that snapshot fails.ai-memory stats && ai-memory audit verify.Version-to-version upgrade notes: MIGRATION_v0.7.md, v0.7.0 release notes.
Every release ships with a public test campaign directory under docs/v0.7.0/test-campaign-YYYY-MM-DD/. The latest release-gate-final campaign (2026-05-22) verifies the operator-facing surfaces — install, configure, deploy, observe, harden, upgrade — against tip fd172f2cf post-22-issue fix batch (#1120-#1141, no v0.8.0 deferrals).