ai-memory for operators — install, deploy, harden, observe

1. Install

One binary, three packaging paths.

Pick the path that matches how your fleet is already managed. All three deliver the same binary.

Homebrew (macOS, Linuxbrew)

brew install alphaonedev/tap/ai-memory

COPR (Fedora / RHEL / Rocky)

sudo dnf copr enable alpha-one-ai/ai-memory
sudo dnf install ai-memory

cargo (any platform with Rust 1.96+)

cargo install ai-memory

Full matrix (Windows, .deb manual install, Docker, source) is in INSTALL.md.

Supported platforms (v0.9.0)

Build + test gates run on:

Linux x86_64, aarch64 (Ubuntu LTS, RHEL/Fedora, Debian)
macOS x86_64, arm64 (12+)
Windows x86_64
iOS aarch64 (cross-compile gate, #1068; aarch64-apple-ios + simulator slices via ai-memory-ios.xcframework.tar.gz)
Android aarch64, armv7, x86, x86_64 (cross-compile gate, #1068; jniLibs/<abi>/ layout via ai-memory-android.tar.gz)

Mobile coverage is layered: every PR + push exercises cargo check --target {aarch64-apple-ios,aarch64-linux-android} (~80% bit-rot coverage), release/** push triggers iOS Simulator + Android emulator runtime tests on the scoped ~50-test mobile subset (FTS5+WAL, HNSW CPU recall, embedder CPU path, LLM TLS handshake). Details: release-notes.md + tests/mobile/README.md in the repo.

Deployment matrix (10 postures)

One binary. 10 deployment shapes.

Post-#1067 (provider-agnostic LLM substrate) + post-#1068 (mobile cross-compile gates), the same Rust binary slots into ten distinct deployment postures. No code branches between them — just env-var configuration of AI_MEMORY_LLM_BACKEND, AI_MEMORY_LLM_BASE_URL, AI_MEMORY_LLM_MODEL, AI_MEMORY_LLM_API_KEY.

#	Posture	CPU / RAM / GPU floor	LLM source	Notes
1a	Cellphone / tablet	1 core / 256 MB / none	Cloud (xAI, OpenAI, Anthropic, Gemini)	iOS arm64 + Android arm64 / armv7 / x86 / x86_64; in-app FFI embed (#1068)
1b	Laptop / workstation	2 core / 4 GB / optional	Ollama local OR cloud	macOS arm64/x64, Linux arm64/x64, Win x64
2	CPU-only cloud VPS ($5/mo)	1 vCPU / 1 GB / none	Cloud only	Linode / Hetzner / DO droplet
3	CPU-only container (Plan C)	1 vCPU / 512 MB / none	Cloud (env-injected)	GHCR Docker, K8s, ECS, Cloud Run
4	CPU-only sidecar	0.25 vCPU / 256 MB / none	Cloud	In-pod sidecar to existing app
5	GPU workstation	4 core / 8 GB / 8 GB VRAM	Local Ollama / llama.cpp	Dev box NVIDIA or Apple-Silicon NPU
6	GPU server (single-node)	8 core / 32 GB / 24 GB VRAM	Local vLLM / Ollama	OpenAI-compatible endpoint inside the box
7	Private DC vLLM cluster	Cluster-scale	Self-hosted vLLM (OAI-compat)	On-prem K8s + vLLM autoscaler
8	Multi-region federation (T4-T5)	3+ nodes	Mixed per-region	Quorum sync + per-region LLM choice
9	Air-gapped / SCIF	Bring your own	Ollama / vLLM ONLY (no cloud egress)	No outbound internet permitted
10	Edge / IoT (arm64 SBC)	2 core / 2 GB / optional NPU	Cloud or tiny local	Raspberry Pi 5, Jetson Nano

Posture transitions are env-var-only. Switch a Plan C container from posture 3 (cloud-LLM via OpenAI) to posture 7 (self-hosted vLLM) by changing AI_MEMORY_LLM_BACKEND=openai-compatible + AI_MEMORY_LLM_BASE_URL=https://vllm.internal/v1. Same binary, same data, same MCP / HTTP / CLI surfaces. Precedence ladder: CLI flag > AI_MEMORY_* env var > config.toml > compiled default.

2. Configure

config.toml essentials.

Default config path: ~/.config/ai-memory/config.toml on Linux/macOS (resolved per service user; there is no --config flag). Skip loading entirely via AI_MEMORY_NO_CONFIG=1 (tests). The full ladder is CLI flag > AI_MEMORY_* env var > config.toml field > compiled default.

# config.toml — minimal production
db = "/var/lib/ai-memory/ai-memory.db"
tier = "semantic"      # keyword | semantic | smart | autonomous
api_key = "<random-high-entropy-key>"  # daemon API key; Plan C injects it
                                        # at boot from AI_MEMORY_API_KEY

[identity]
anonymize_default = false

[permissions]
mode = "enforce"       # v0.7.0 secure default

[limits]                                 # operator-tunable resource caps (#1156 follow-up)
max_memories_per_day = 1000              # per-agent daily memory-write quota
max_storage_bytes    = 104857600         # per-agent storage cap (bytes; 100 MiB)
max_links_per_day    = 5000              # per-agent daily link-write quota
max_page_size        = 1000              # list/bulk/sync page-size cap (OOM guard)

The [limits] section makes both the per-agent daily write quota and the list/bulk/sync page-size cap operator-tunable without recompiling. Each field resolves AI_MEMORY_MAX_* env > [limits] > compiled default; a non-positive or unparseable value falls through to the default (so a stray 0 can never clamp every list response to empty). The three quota fields seed fresh agent_quotas rows; max_page_size bounds per-request in-memory materialization as an OOM guard.

Bind address, port, and TLS are serve flags, not config fields: ai-memory serve --host 127.0.0.1 --port 9077 --tls-cert /etc/ai-memory/tls.crt --tls-key /etc/ai-memory/tls.key (add --mtls-allowlist for fingerprint-pinned client certs). When api_key is set, clients authenticate with the x-api-key request header — the canonical surface; the legacy ?api_key= query parameter is deprecated (it leaks into access logs and proxies; see #1574).

Every knob — including the full environment-variable table, secret-vs-config classification, and precedence regression tests — is enumerated in ADMIN_GUIDE.md and CONFIG_SCHEMA.md.

3. Deploy

Four operational modes.

stdio MCP: ai-memory mcp — long-lived stdio subprocess, launched by the AI client. Best for single-developer local use.
HTTP daemon: ai-memory serve — Axum REST on port 9077, 92 route registrations / 78 unique URL paths. Best for shared org deployments with mTLS or API-key gating.
autonomous curator: ai-memory curator — long-running consolidation + reflection loop. Provider-agnostic (#1067): backs onto Ollama (local) or any OpenAI-compatible vendor via AI_MEMORY_LLM_BACKEND — xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM, llama.cpp-server. Best as a sidecar to the HTTP daemon.
quorum sync: ai-memory sync-daemon — signed-event replication between peer nodes. Best for multi-region hive (T3+) deployments.

The same binary covers all four, distinguished only by the subcommand. Architectures T1 (single agent) through T5 (global hive) are walked through in architectures.html.

systemd unit for the HTTP daemon

[Unit]
Description=ai-memory HTTP daemon
After=network-online.target

[Service]
Type=simple
User=ai-memory
Group=ai-memory
ExecStart=/usr/bin/ai-memory --db /var/lib/ai-memory/ai-memory.db serve
Restart=on-failure
RestartSec=2s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

4. Observe

Telemetry, audit, signed events.

Three observability surfaces, all on by default:

tracing logs — standard tracing-subscriber via RUST_LOG. JSON output supported.
append-only audit chain — per-event Ed25519-signed records under the configured audit dir. Replayable via ai-memory audit verify.
capabilities v3 — GET /api/v1/capabilities returns pre-computed calibration strings for monitoring tooling. Never includes secrets.

Full surface in telemetry.md and signed-events-v4.md.

5. Harden

The v0.7.0 secure-default posture.

Permissions on by default. permissions.mode = "enforce" from v0.7.0 onward. Advisory/off requires explicit opt-out.
SSRF gate on webhooks. AI_MEMORY_ALLOW_LOOPBACK_WEBHOOKS defaults false; loopback URLs rejected.
Operator-signed substrate rules (L1–L6). Governance rules are Ed25519-signed by AI_MEMORY_OPERATOR_PUBKEY; treat that env var as override-authority.
HMAC-required subscriptions. Webhook subscribers must register an HMAC secret; unsigned events are rejected.
API-key file mode 0400. The Plan C entrypoint reads AI_MEMORY_API_KEY from env and writes the config at boot; never logged.
SQLcipher build available. Enables at-rest encryption with --db-passphrase-file (also mode 0400). See encryption.html.

Override authority warning. Anyone who sets AI_MEMORY_OPERATOR_PUBKEY controls rule signing. Lock down host access; keep the operator key offline when possible.

Full hardening checklist: SECURITY.md.

6. Container deployment (Plan C)

One image, env-driven config.

Plan C is the production container recipe. Image boots, reads AI_MEMORY_API_KEY + AI_MEMORY_DB from env, renders config.toml, and exec's the daemon. No baked-in secrets.

docker run -d \
  --name ai-memory \
  -p 9077:9077 \
  -e AI_MEMORY_API_KEY=$(cat /run/secrets/ai-memory-key) \
  -e AI_MEMORY_DB=/data/ai-memory.db \
  -v /var/lib/ai-memory:/data \
  ghcr.io/alphaonedev/ai-memory:0.9.0

Per-issue context (#845 hardened entrypoint, never-leak invariant): integrations/README.md.

7. Upgrade

Schema-versioned, migration-tested.

The SQLite store carries an integer schema version (currently v78 at v0.9.0; was v71 at v0.8.1, v33 at v0.6.4). v55 made the list_memories_updated_since federation-catchup query sargable and added idx_memories_updated_at per #1476 so peer catch-up scans get a true index range instead of a seq scan. v54 backfilled tier-default expires_at onto legacy NULL-expiry mid/short rows per #1466, closing the TTL-leak immortal-rows class. v53 scoped the memories_au FTS5 sync trigger to (title, content, tags) only per #1418 so non-FTS column updates no longer fire a needless sync. v52 added the transcript_line_dedup table per #1389 L4, backing the sha256-keyed idempotency layer for turn-capture. v51 added the federation_nonces table per #1255 / PR #1296 so peer-replay-prevention nonces persist across daemon restarts. v50 extended agent_quotas PRIMARY KEY from (agent_id) to (agent_id, namespace) per #1156 so per-namespace K8 quota allotments hold even when a single agent operates across many namespaces (pre-v50 rows backfill to the _global sentinel namespace). Migrations run on first open of the new binary; the migration set is dry-run-tested against the operator's own DB by the project's dogfood script before a release ships.

Backup the DB before upgrade: cp ai-memory.db ai-memory.db.bak. The binary additionally takes its own automatic pre-migration snapshot (#1576): before any schema-mutating migration it writes a transactionally-consistent VACUUM INTO sibling named <db>.pre-migration-v<FROM>-to-v<TO>-<token>.bak, and refuses to mutate the schema if that snapshot fails.
Stop the daemon, install the new binary, restart. Migrations run automatically.
Validate post-upgrade: ai-memory stats && ai-memory audit verify.
Rollback (#1576): there is no migration downgrade path — forward-only by design. Stop the daemon, reinstall the previous binary, restore the pre-migration snapshot over the live file, restart. Writes that landed between migration and rollback are lost with the restore; drain traffic first if they matter.

Version-to-version upgrade notes: MIGRATION_v0.7.md, v0.7.0 release notes.

Test results

Evidence behind the SHIP verdict.

Every release ships with a public test campaign directory under docs/v0.7.0/test-campaign-YYYY-MM-DD/. The latest release-gate-final campaign (2026-05-22) verifies the operator-facing surfaces — install, configure, deploy, observe, harden, upgrade — against tip fd172f2cf post-22-issue fix batch (#1120-#1141, no v0.8.0 deferrals).

Latest campaign index2026-05-22 · SHIP-RECOMMENDED · 7,321 PASS / 0 FAIL For non-technical readersPlain-English verdict + what it means for you For decision-makersRisk, cost, comparison, roadmap For SME engineersReproducibility, methodology, per-issue root-cause

Deploy ai-memory in production.

One binary, three packaging paths.

Homebrew (macOS, Linuxbrew)

COPR (Fedora / RHEL / Rocky)

cargo (any platform with Rust 1.96+)

Supported platforms (v0.9.0)

One binary. 10 deployment shapes.

config.toml essentials.

Four operational modes.

systemd unit for the HTTP daemon

Telemetry, audit, signed events.

The v0.7.0 secure-default posture.

One image, env-driven config.

Schema-versioned, migration-tested.

Evidence behind the SHIP verdict.

Where to go from here.