ai-memory  /  audience  /  operator
For operators / SREs

Deploy ai-memory in production.

ai-memory ships as a single static Rust binary with four operational modes. This page is the operator's path: install, configure, deploy, observe, harden, upgrade. Every claim links to the reference doc that backs it.

1. Install

One binary, three packaging paths.

Pick the path that matches how your fleet is already managed. All three deliver the same binary.

Homebrew (macOS, Linuxbrew)

brew install alphaonedev/tap/ai-memory

COPR (Fedora / RHEL / Rocky)

sudo dnf copr enable alpha-one-ai/ai-memory
sudo dnf install ai-memory

cargo (any platform with Rust 1.96+)

cargo install ai-memory

Full matrix (Windows, .deb manual install, Docker, source) is in INSTALL.md.

Supported platforms (v0.7.0)

Build + test gates run on:

Mobile coverage is layered: every PR + push exercises cargo check --target {aarch64-apple-ios,aarch64-linux-android} (~80% bit-rot coverage), release/** push triggers iOS Simulator + Android emulator runtime tests on the scoped ~50-test mobile subset (FTS5+WAL, HNSW CPU recall, embedder CPU path, LLM TLS handshake). Details: release-notes.md + tests/mobile/README.md in the repo.

Deployment matrix (10 postures)

One binary. 10 deployment shapes.

Post-#1067 (provider-agnostic LLM substrate) + post-#1068 (mobile cross-compile gates), the same Rust binary slots into ten distinct deployment postures. No code branches between them — just env-var configuration of AI_MEMORY_LLM_BACKEND, AI_MEMORY_LLM_BASE_URL, AI_MEMORY_LLM_MODEL, AI_MEMORY_LLM_API_KEY.

#PostureCPU / RAM / GPU floorLLM sourceNotes
1aCellphone / tablet1 core / 256 MB / noneCloud (xAI, OpenAI, Anthropic, Gemini)iOS arm64 + Android arm64 / armv7 / x86 / x86_64; in-app FFI embed (#1068)
1bLaptop / workstation2 core / 4 GB / optionalOllama local OR cloudmacOS arm64/x64, Linux arm64/x64, Win x64
2CPU-only cloud VPS ($5/mo)1 vCPU / 1 GB / noneCloud onlyLinode / Hetzner / DO droplet
3CPU-only container (Plan C)1 vCPU / 512 MB / noneCloud (env-injected)GHCR Docker, K8s, ECS, Cloud Run
4CPU-only sidecar0.25 vCPU / 256 MB / noneCloudIn-pod sidecar to existing app
5GPU workstation4 core / 8 GB / 8 GB VRAMLocal Ollama / llama.cppDev box NVIDIA or Apple-Silicon NPU
6GPU server (single-node)8 core / 32 GB / 24 GB VRAMLocal vLLM / OllamaOpenAI-compatible endpoint inside the box
7Private DC vLLM clusterCluster-scaleSelf-hosted vLLM (OAI-compat)On-prem K8s + vLLM autoscaler
8Multi-region federation (T4-T5)3+ nodesMixed per-regionQuorum sync + per-region LLM choice
9Air-gapped / SCIFBring your ownOllama / vLLM ONLY (no cloud egress)No outbound internet permitted
10Edge / IoT (arm64 SBC)2 core / 2 GB / optional NPUCloud or tiny localRaspberry Pi 5, Jetson Nano
Posture transitions are env-var-only. Switch a Plan C container from posture 3 (cloud-LLM via OpenAI) to posture 7 (self-hosted vLLM) by changing AI_MEMORY_LLM_BACKEND=openai-compatible + AI_MEMORY_LLM_BASE_URL=https://vllm.internal/v1. Same binary, same data, same MCP / HTTP / CLI surfaces. Precedence ladder: CLI flag > AI_MEMORY_* env var > config.toml > compiled default.
2. Configure

config.toml essentials.

Default config path: ~/.config/ai-memory/config.toml on Linux/macOS (resolved per service user; there is no --config flag). Skip loading entirely via AI_MEMORY_NO_CONFIG=1 (tests). The full ladder is CLI flag > AI_MEMORY_* env var > config.toml field > compiled default.

# config.toml — minimal production
db = "/var/lib/ai-memory/ai-memory.db"
tier = "semantic"      # keyword | semantic | smart | autonomous
api_key = "<random-high-entropy-key>"  # daemon API key; Plan C injects it
                                        # at boot from AI_MEMORY_API_KEY

[identity]
anonymize_default = false

[permissions]
mode = "enforce"       # v0.7.0 secure default

[limits]                                 # operator-tunable resource caps (#1156 follow-up)
max_memories_per_day = 1000              # per-agent daily memory-write quota
max_storage_bytes    = 104857600         # per-agent storage cap (bytes; 100 MiB)
max_links_per_day    = 5000              # per-agent daily link-write quota
max_page_size        = 1000              # list/bulk/sync page-size cap (OOM guard)

The [limits] section makes both the per-agent daily write quota and the list/bulk/sync page-size cap operator-tunable without recompiling. Each field resolves AI_MEMORY_MAX_* env > [limits] > compiled default; a non-positive or unparseable value falls through to the default (so a stray 0 can never clamp every list response to empty). The three quota fields seed fresh agent_quotas rows; max_page_size bounds per-request in-memory materialization as an OOM guard.

Bind address, port, and TLS are serve flags, not config fields: ai-memory serve --host 127.0.0.1 --port 9077 --tls-cert /etc/ai-memory/tls.crt --tls-key /etc/ai-memory/tls.key (add --mtls-allowlist for fingerprint-pinned client certs). When api_key is set, clients authenticate with the x-api-key request header — the canonical surface; the legacy ?api_key= query parameter is deprecated (it leaks into access logs and proxies; see #1574).

Every knob — including the full environment-variable table, secret-vs-config classification, and precedence regression tests — is enumerated in ADMIN_GUIDE.md and CONFIG_SCHEMA.md.

3. Deploy

Four operational modes.

stdio MCP
ai-memory mcp — long-lived stdio subprocess, launched by the AI client. Best for single-developer local use.
HTTP daemon
ai-memory serve — Axum REST on port 9077, 89 route registrations / 75 unique URL paths. Best for shared org deployments with mTLS or API-key gating.
autonomous curator
ai-memory curator — long-running consolidation + reflection loop. Provider-agnostic (#1067): backs onto Ollama (local) or any OpenAI-compatible vendor via AI_MEMORY_LLM_BACKEND — xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM, llama.cpp-server. Best as a sidecar to the HTTP daemon.
quorum sync
ai-memory sync-daemon — signed-event replication between peer nodes. Best for multi-region hive (T3+) deployments.

The same binary covers all four, distinguished only by the subcommand. Architectures T1 (single agent) through T5 (global hive) are walked through in architectures.html.

systemd unit for the HTTP daemon

[Unit]
Description=ai-memory HTTP daemon
After=network-online.target

[Service]
Type=simple
User=ai-memory
Group=ai-memory
ExecStart=/usr/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=2s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
4. Observe

Telemetry, audit, signed events.

Three observability surfaces, all on by default:

Full surface in telemetry.md and signed-events-v4.md.

5. Harden

The v0.7.0 secure-default posture.

  1. Permissions on by default. permissions.mode = "enforce" from v0.7.0 onward. Advisory/off requires explicit opt-out.
  2. SSRF gate on webhooks. AI_MEMORY_ALLOW_LOOPBACK_WEBHOOKS defaults false; loopback URLs rejected.
  3. Operator-signed substrate rules (L1–L6). Governance rules are Ed25519-signed by AI_MEMORY_OPERATOR_PUBKEY; treat that env var as override-authority.
  4. HMAC-required subscriptions. Webhook subscribers must register an HMAC secret; unsigned events are rejected.
  5. API-key file mode 0400. The Plan C entrypoint reads AI_MEMORY_API_KEY from env and writes the config at boot; never logged.
  6. SQLcipher build available. Enables at-rest encryption with --db-passphrase-file (also mode 0400). See encryption.html.
Override authority warning. Anyone who sets AI_MEMORY_OPERATOR_PUBKEY controls rule signing. Lock down host access; keep the operator key offline when possible.

Full hardening checklist: SECURITY.md.

6. Container deployment (Plan C)

One image, env-driven config.

Plan C is the production container recipe. Image boots, reads AI_MEMORY_API_KEY + AI_MEMORY_DB from env, renders config.toml, and exec's the daemon. No baked-in secrets.

docker run -d \
  --name ai-memory \
  -p 9077:9077 \
  -e AI_MEMORY_API_KEY=$(cat /run/secrets/ai-memory-key) \
  -e AI_MEMORY_DB=/data/ai-memory.db \
  -v /var/lib/ai-memory:/data \
  ghcr.io/alphaonedev/ai-memory:0.7.0

Per-issue context (#845 hardened entrypoint, never-leak invariant): integrations/README.md.

7. Upgrade

Schema-versioned, migration-tested.

The SQLite store carries an integer schema version (currently v57 at v0.7.0; was v33 at v0.6.4). v55 made the list_memories_updated_since federation-catchup query sargable and added idx_memories_updated_at per #1476 so peer catch-up scans get a true index range instead of a seq scan. v54 backfilled tier-default expires_at onto legacy NULL-expiry mid/short rows per #1466, closing the TTL-leak immortal-rows class. v53 scoped the memories_au FTS5 sync trigger to (title, content, tags) only per #1418 so non-FTS column updates no longer fire a needless sync. v52 added the transcript_line_dedup table per #1389 L4, backing the sha256-keyed idempotency layer for turn-capture. v51 added the federation_nonces table per #1255 / PR #1296 so peer-replay-prevention nonces persist across daemon restarts. v50 extended agent_quotas PRIMARY KEY from (agent_id) to (agent_id, namespace) per #1156 so per-namespace K8 quota allotments hold even when a single agent operates across many namespaces (pre-v50 rows backfill to the _global sentinel namespace). Migrations run on first open of the new binary; the migration set is dry-run-tested against the operator's own DB by the project's dogfood script before a release ships.

Version-to-version upgrade notes: MIGRATION_v0.7.md, v0.7.0 release notes.

Test results

Evidence behind the SHIP verdict.

Every release ships with a public test campaign directory under docs/v0.7.0/test-campaign-YYYY-MM-DD/. The latest release-gate-final campaign (2026-05-22) verifies the operator-facing surfaces — install, configure, deploy, observe, harden, upgrade — against tip fd172f2cf post-22-issue fix batch (#1120-#1141, no v0.8.0 deferrals).

Next

Where to go from here.