From a developer's laptop to a multi-region hive of agents — same binary, same schema, same code path. You don't graduate off ai-memory; you turn flags on.
attested-cortex · capabilities v3 · Ed25519 attestation · programmable hooks · sidechain transcripts (built on v0.6.4 76.4%-lighter baseline)
Most "AI memory" products are chat memory: one user, one AI, one conversation. ai-memory is the agent substrate — federation primitives, governance policies, capability allowlists, audit trails, peer mTLS, webhook event bus, autonomous curator. Built for the agent era, not retrofitted from chat memory. Apache 2.0, single Rust binary, runs on SQLite. Works with any MCP-compatible AI — Claude, ChatGPT, Cursor, Grok, Llama, OpenClaw 🦞, every modern AI tool.
One install, five tiers. T1 (single agent, your laptop) → T2 (many agents, one host) → T3 (multi-node cluster) → T4 (data-center swarm) → T5 (global hive). Same binary across all five — flip flags, not products. Quorum W-of-N writes, vector-clock CRDT-lite merge, mTLS peer allowlist, NHI capability allowlist, capability-expansion audit log (schema v20). See the ladder →
Zero token cost until recall.
Vendor memory (Claude auto-memory, ChatGPT memory) loads everything into every message — burning tokens on idle context. ai-memory uses zero context tokens until the AI calls memory_recall. Only relevant memories return, ranked by 6 factors, compressed in TOON format (79% smaller than JSON). v0.6.4 cuts the bootstrap cost too: 76.4% input-token reduction on session start (5-tool default surface; the other 38 of 43 load on demand via --profile or runtime expansion).
Empirically validated, not just tested. NHI Discovery Gate — 6/6 PASS, GATE GREEN against live xAI Grok 4.3. Test Hub with per-release CERT verdicts. ~2,400 tests at 93.84% line coverage (56,996 lines, 3,511 missed; region 93.81%, function 92.65%); CI-enforced ≥93% line floor that fails any PR below it. Published evidence pages tie every claim back to a verifiable source — including an explicit honesty correction on the v0.6.4 token-reduction methodology. Coming in v0.7: cryptographic provenance (Ed25519), hash-chained audit logs, forensic export bundles — regulated-industry primitives.
First-party SDKs, OIDC-published.
@alphaone/ai-memory on npm · ai-memory-mcp on PyPI. requireProfile helper for runtime profile assertions.
Works with Claude · ChatGPT · Grok CLI · Grok API · Cursor · Windsurf · Continue.dev · OpenClaw 🦞 · Hermes Agent · Llama · any MCP client
LongMemEval Benchmark (ICLR 2025) — 500 questions, 6 categories
Pure SQLite FTS5 + BM25 — zero cloud dependencies — full benchmark details & replication steps
A real session with Grok 4.2 reasoning (xAI) on the v0.6.4 release binary. Same agent, same 90 minutes, same machine. The only variable that changed was the --profile flag.
--profile core · 5 tools advertised
"I am operating as an NHI intelligence plugged into what is essentially a fancy notebook with good search."
Grok said it could see all 43 tools in the manifest, then named the ones it couldn't directly call. Honest about scope. The verdict was specifically: useful retrieval layer.
--profile full · all 43 tools advertised
"The system has gone from useful retrieval layer to actual memory cortex substrate. This is the first version I would willingly use as primary long-term memory. I respect it."
After the operator restarted the daemon with --profile full, the same agent re-assessed. The verdict shifted to: actual memory cortex substrate.
The takeaway nobody else can say with this kind of evidence: v0.6.4's default surface is a deliberate, knowable, reversible cost-control choice — not a capability ceiling. The full surface is one flag away. Pick the trade-off that matches your workload, and the binary delivers either experience without any code changes.
Full transcripts, including the agent's own moment of revising its assessment after operator correction, on the NHI Discovery Gate observation cells · substrate-side fixes filed at issue #545.
v0.6.4's runtime-expansion path is the bridge between the 76.4% token-saving default and the full 43-tool cortex. On harnesses that support deferred-tool registration, the agent loads exactly the families it needs, mid-session, no restart. The Pareto-optimal point of the v0.6.4 design — and it's not a future roadmap item, it works today.
--profile coreeager-loading harnesses |
--profile full |
core + deferred-registration on Claude Code / OpenClaw |
|
|---|---|---|---|
| Boot-time token cost | ~1,500 | ~6,200 | ~1,500 |
| All 43 tools reachable | via flag/restart | yes | yes, on demand |
| Mid-session: load family X | restart server | n/a | memory_capabilities(family=X, include_schema=true) |
| Net per-session cost | low (limited surface) | high | low + only families used (typically 1-2 of 8) |
| Harness | Deferred-tool registration | Cortex-on-core today? |
|---|---|---|
| Claude Code | ✅ via ToolSearch | ✅ Yes — start with --profile core |
| OpenClaw 🦞 | ✅ native | ✅ Yes — start with --profile core |
| Claude Desktop | ❌ eager-load only | use --profile full |
| Codex CLI (OpenAI) | ❌ eager-load only | use --profile full |
| Grok CLI (xAI) | ❌ eager-load only | use --profile full |
| Gemini CLI (Google) | ❌ eager-load only | use --profile full |
What's blocked behind the harness, not the substrate: the 2026-05-05 Grok 4.2 reasoning before/after on the same v0.6.4 binary makes this concrete. Under --profile core on Grok CLI (no deferred registration), Grok said *"intelligence plugged into a fancy notebook"*. Switch to --profile full and Grok said *"actual memory cortex substrate ... I respect it"*. The substrate did its job in both cases — the Grok CLI session was capped by the harness, not the v0.6.4 design. Claude Code and OpenClaw users on the same release get the cortex-substrate experience starting from --profile core automatically — the harness's deferred registration closes the loop.
Roadmap fix to lift this for all harnesses (schema compaction + memory_smart_load(intent)) tracked at issue #546.
v0.7 attested-cortex — full compatibility matrix. The table above is the v0.6.4 cortex-on-core baseline. The v0.7 release expands the feature set to capabilities v3, named loaders (memory_load_family, memory_smart_load), the hook pipeline, Ed25519 attestation, sidechain transcripts, Apache AGE, and the approval API — with per-harness × per-feature status across 9 harnesses (Claude Code, Claude Desktop, Codex, Cursor, Cline, Continue, Aider, Goose, generic JSON-RPC). Status emojis (✅ / ⚠️ / 🚧 / ❌ / N/A), Discovery Gate cells, and SDK compat tables are all on a single page.
Live xAI Grok 4.3 driving an OpenClaw harness against the v0.6.4 release binary, all four discovery tiers green:
-32601 via --include-schema--include-schema *before* failingThe discovery dance is not theoretical. It has been measured against a real LLM behind a real harness, with full transcripts, MCP wire logs, and verdict JSON published. NHI Discovery Gate →
core (default) or fullv0.6.4 ships with a 5-tool default surface (core) so the AI doesn't pre-pay token cost on every turn for tools it might not need. The other 38 tools — knowledge graph, governance, consolidation, contradiction detection, lifecycle, archive, meta — are right there, loaded by a single flag. No tools are removed in v0.6.4. Only their default visibility changed. If you want the full surface, you turn it on; nothing is gated.
--profile core5 tools advertised on session start: store, recall, search, list, get, plus the always-on memory_capabilities.
Pros
memory_capabilities(family=<name>, include_schema=true)Cons
ai-memory mcp # default; --profile core implicit
Best for: single-user productivity workflows, cost-sensitive deployments, harnesses where token cost dominates UX.
--profile fullAll 43 tools advertised from session start. v0.6.3-equivalent surface, byte-for-byte schema parity.
Pros
memory_link, memory_kg_query, entity registration, temporal validity)memory_consolidate, memory_detect_contradiction, memory_auto_tag, memory_expand_querymemory_delete, memory_promote, memory_updateCons
ai-memory mcp --profile full # or: export AI_MEMORY_PROFILE=full # or: [mcp] profile = "full" in config.toml
Best for: multi-agent / NHI deployments, knowledge-graph workflows, agents acting as long-term-memory cortex, anything beyond store-and-recall.
Under --profile core
"I am operating as an NHI intelligence plugged into what is essentially a fancy notebook with good search."
After --profile full
"The system has gone from useful retrieval layer to actual memory cortex substrate. This is the first version I would willingly use as primary long-term memory. I respect it."
Same v0.6.4 release binary. Same Grok 4.2 reasoning agent. Same 90-minute session. The only thing that changed was the --profile flag. Discovery Gate observation cells →
Resolution order: CLI flag > environment variable (AI_MEMORY_PROFILE) > [mcp].profile in config.toml > core default. The choice is always yours and always overridable. Mid-session expansion is also supported on harnesses with deferred-tool registration (Claude Code, OpenClaw) via memory_capabilities(family=<name>, include_schema=true). Full migration guide →
ai-memory is a multi-mode Rust application (tokio + axum) that can run any of these in isolation or simultaneously. They share one SQLite database.
43 native tools over JSON-RPC for Claude, Codex, Cursor, Gemini, Grok, OpenClaw, and any MCP client. Reactive — answers per turn. ai-memory mcp
42 REST endpoints on 127.0.0.1:9077. TLS, optional mTLS allowlist, API-key auth. Runs a background GC loop. ai-memory serve
Self-scheduling loop that keeps memory tidy. Auto-tag, contradiction detection, near-duplicate consolidation, priority feedback, rollback log. Default cadence 1h. ai-memory curator --daemon
Quorum-based peer federation across instances. W-of-N writes (default majority), vector-clock CRDT-lite merge, mTLS allowlist between peers. ai-memory sync-daemon
Why the curator matters
Most memory layers are persistent but inert — what you put in is what you get out. ai-memory is self-maintaining: the curator runs while your AI agents are not active, surfacing contradictions before they accumulate, consolidating duplicates so recall stays sharp, and reranking priority by access pattern. Every action is reversible via the rollback log. Destructive operations can be gated behind the governance approval flow. Recall quality stays high as the corpus grows.
Brass-tacks assessment by Claude Opus 4.7 (Anthropic, 1M-context) after walking the v0.6.2 source line by line. No marketing review — independent technical read.
Direct quote — Claude Opus 4.7
"ai-memory is the most capable memory layer I've ever been hooked up to, and meaningfully more than its name advertises. For me, in practical terms, it means:"
"It is not making me an autonomous agent. It is giving me the kind of memory infrastructure that an autonomous agent would need — and itself running a small autonomous loop to maintain it. That's a real foundation. The gap from here to 'ai-memory drives general tasks' is plumbing (tool-call protocol + tool registry + a tool-use-capable model), not invention."
"If you wanted to know whether the project is more substantial than its README implies: yes, clearly. If you wanted to know whether it makes me autonomous: no, but it makes the substrate I'd need."
ai-memory is not an agent runtime and not "autonomous AI." It is the memory layer that multi-agent autonomous deployments need underneath them — built for the operating shape that 24/7 multi-machine agent fleets actually have.
Run agents across N peers. broadcast_store_quorum fans writes out, requires W acknowledgements before declaring success, and spawn_catchup_loop backfills lagging peers asynchronously. No race-conditions when 100 agents write in parallel.
The same curator that keeps a single-agent corpus tidy keeps a swarm-shared corpus from degrading into noise. Many agents scribbling into shared memory without consolidation drifts into duplicates and contradictions within hours; the curator prevents that.
Webhooks fire on memory events with HMAC-signed payloads, namespace + agent filters, and SSRF-hardened URL validation. Agent A writes a finding; agent B's webhook fires and dispatches the next step. The store becomes the message bus.
Per-agent / per-team / per-org namespaces with N-level inheritance. Visibility scopes (private / team / unit / org / collective) and per-namespace governance policies (write / promote / delete authority, approver type, optional N-of-M consensus) — so a swarm operates inside boundaries you set.
Honest scope
Stack ai-memory under a 24/7 multi-machine agent runner (e.g. OpenClaw), an agent framework with auto-generated skills (e.g. Nous's Hermes Agent), or a self-organizing parallel swarm, and the combined system clears the behavioral bar for autonomous AI: it pursues goals across time, learns from outcomes, restructures itself, and operates without per-step human intervention.
What's still left, honestly: no weight-level learning (skills accumulate around a frozen model), each LLM call is still stateless cognition, and the root goal is human-seeded. Those are real gaps. ai-memory does not close them. It does provide the multi-agent memory substrate that closing them at the system level requires.
MCP is the universal integration layer. The HTTP API works with literally anything that can make a request. No vendor lock-in.
Anthropic's Claude Code, Claude Desktop, and any Claude-based tool
MCP NativeOpenAI's Codex command-line agent with TOML-based MCP config
MCP NativeGoogle's Gemini CLI with JSON-based MCP server configuration
MCP NativeAI-powered code editor with built-in MCP support
MCP NativeCodeium's AI IDE with MCP tool integration
MCP NativeOpen-source AI code assistant with YAML-based MCP config
MCP NativeGrok API and xAI-based applications via remote MCP
Remote MCP (HTTPS)Llama Stack toolgroup registration via HTTP server
HTTP / MCPSelf-hosted AI assistant with MCP via mcp.servers config
Nous Research open-weight agent with YAML-based MCP config
MCP NativeAny tool that speaks the Model Context Protocol -- present or future
UniversalMCP = native tool integration (stdio JSON-RPC) | HTTP = REST API on localhost:9077 (works with anything) | CLI = shell commands (scriptable, pipeable)
One command. No dependencies for pre-built binaries. Eight installation methods.
Pre-built binary. Auto-detects OS & architecture.
curl -fsSL https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.sh | sh
PowerShell installer. Adds to PATH automatically.
irm https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.ps1 | iex
Native dnf package. Auto-updates with system.
sudo dnf copr enable alpha-one-ai/ai-memory
sudo dnf install ai-memory
Tap formula. Pre-built binary, no compile.
brew install alphaonedev/tap/ai-memory
Containerized HTTP server on port 9077.
docker build -t ai-memory .
docker run -p 9077:9077 -v data:/data ai-memory
Pre-built binary via cargo. No compile step.
cargo binstall ai-memory
Supported platforms: macOS (Intel + Apple Silicon) • Linux (x86_64 + ARM64) • Windows (x86_64) • WSL • Docker
Build from source?
Ubuntu/Debian: sudo apt install build-essential pkg-config •
Fedora/RHEL: sudo dnf install gcc pkg-config •
macOS: Xcode CLT (pre-installed) •
Windows: MSVC C++ build tools
The keyword and semantic tiers work with zero dependencies. The smart and autonomous tiers add LLM-powered query expansion, auto-tagging, and neural reranking via Ollama.
The smart and autonomous tiers use local LLMs via Ollama for query expansion, auto-tagging, contradiction detection, and cross-encoder reranking. Skip this step if you only need keyword or semantic search.
# Install via Homebrew
brew install ollama
# Or download the macOS app:
# https://ollama.com/download/mac
# Start the Ollama service
ollama serve &
# (or launch the Ollama.app -- it runs as a menu bar item)
# Pull models for your tier
ollama pull nomic-embed-text # Embeddings (smart+)
ollama pull gemma4:e2b # LLM — Smart (~1GB)
ollama pull gemma4:e4b # LLM — Autonomous (~2.3GB)
# One-line install script
curl -fsSL https://ollama.com/install.sh | sh
# Enable and start the systemd service
sudo systemctl enable ollama
sudo systemctl start ollama
# Pull models for your tier
ollama pull nomic-embed-text # Embeddings (smart+)
ollama pull gemma4:e2b # LLM — Smart (~1GB)
ollama pull gemma4:e4b # LLM — Autonomous (~2.3GB)
# Install via winget
winget install Ollama.Ollama
# Or download the installer:
# https://ollama.com/download/windows
# Ollama runs as a system service after install
# Pull models for your tier
ollama pull nomic-embed-text # Embeddings (smart+)
ollama pull gemma4:e2b # LLM — Smart (~1GB)
ollama pull gemma4:e4b # LLM — Autonomous (~2.3GB)
# Check Ollama is running and models are available
curl http://localhost:11434/api/tags
ollama run gemma4:e2b "Hello, world" # Should respond in ~1s
ai-memory connects to Ollama at localhost:11434 automatically. Override with ollama_url in ~/.config/ai-memory/config.toml or --ollama-url flag. If Ollama is unavailable, ai-memory gracefully falls back to the semantic tier.
Choose the integration method that fits your setup.
Claude Code MCP Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| User (global) | ~/.claude.json | All projects on your machine |
| Project (shared) | .mcp.json in project root | Everyone on the project (via git) |
| Local (private) | ~/.claude.json under projects | One project, just you |
User scope (recommended) — merge mcpServers into your existing ~/.claude.json (macOS/Linux) or %USERPROFILE%\.claude.json (Windows):
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
}
}
}json
Restart Claude Code. It will discover all 43 memory tools natively. No daemon, no ports. MCP servers do not go in settings.json or settings.local.json. The --tier flag is required — options: keyword, semantic (default), smart, autonomous. Smart/autonomous require Ollama.
Windows: Use ai-memory.exe for the command and forward slashes in paths: "C:/Users/YourName/.claude/ai-memory.db"
OpenAI Codex CLI Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| Global (user) | ~/.codex/config.toml | All projects on your machine |
| Project | .codex/config.toml in project root | Trusted projects only |
Windows: %USERPROFILE%\.codex\config.toml. Override config dir with CODEX_HOME env var.
# OpenAI Codex CLI MCP configuration
[mcp_servers.memory]
command = "ai-memory"
args = ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
enabled = truetoml
CLI shortcut: codex mcp add memory -- ai-memory --db ~/.local/share/ai-memory/memories.db mcp --tier semantic
Codex uses TOML with underscored key mcp_servers (not camelCase). Supports env, env_vars, enabled_tools, disabled_tools, startup_timeout_sec, tool_timeout_sec. Use /mcp in the TUI to view server status. Windows/WSL: WSL uses Linux home by default — set CODEX_HOME to share config with Windows host. See Codex MCP docs.
Google Gemini CLI Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| User (global) | ~/.gemini/settings.json | All projects on your machine |
| Project | .gemini/settings.json in project root | Scoped to that project |
Windows: %USERPROFILE%\.gemini\settings.json. Env vars: $VAR / ${VAR} (all platforms), %VAR% (Windows).
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"],
"timeout": 30000
}
}
}json
CLI shortcut: gemini mcp add memory ai-memory -- --db ~/.local/share/ai-memory/memories.db mcp --tier semantic
Avoid underscores in server names (use hyphens). Tool names are auto-prefixed as mcp_memory_<toolName>. Env vars in env field support $VAR / ${VAR} (all platforms) and %VAR% (Windows). Gemini sanitizes sensitive patterns (*TOKEN*, *SECRET*) from inherited env unless declared. Add "trust": true to skip confirmation. CLI: gemini mcp list/remove/enable/disable. See Gemini CLI MCP docs.
Cursor IDE Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| Global (user) | ~/.cursor/mcp.json | All projects on your machine |
| Project | .cursor/mcp.json in project root | Overrides global for same-named servers |
Windows: %USERPROFILE%\.cursor\mcp.json. Also configurable via Settings > Tools & MCP.
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
}
}
}json
Or add via Cursor Settings > Tools & MCP. Restart Cursor after editing. Verify with green dot in Settings. Supports env, envFile, ${env:VAR_NAME} interpolation (can be unreliable for shell profile vars — use envFile as workaround). ~40 tool limit across all servers. See Cursor MCP docs.
Windsurf (Codeium) Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| Global only | ~/.codeium/windsurf/mcp_config.json | All projects (no project scope) |
Windows: %USERPROFILE%\.codeium\windsurf\mcp_config.json. Also configurable via MCP Marketplace or Settings > Cascade > MCP Servers.
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.codeium/windsurf/ai-memory.db", "mcp", "--tier", "semantic"]
}
}
}json
Supports ${env:VAR_NAME} interpolation in command, args, env, serverUrl, url, and headers. 100 tool limit across all servers. Can also add via MCP Marketplace or Settings > Cascade > MCP Servers. See Windsurf MCP docs.
Continue.dev Configuration Scopes:
| Scope | File | Applies to |
|---|---|---|
| User (global) | ~/.continue/config.yaml | All projects on your machine |
| Project | .continue/mcpServers/ dir in project root | Per-server YAML/JSON files |
Windows: %USERPROFILE%\.continue\config.yaml. Project dir auto-detects JSON configs from other tools.
# Continue.dev MCP configuration
mcpServers:
- name: memory
command: ai-memory
args:
- "--db"
- "~/.continue/ai-memory.db"
- "mcp"
- "--tier"
- "semantic"yaml
MCP tools only work in agent mode. Supports ${{ secrets.SECRET_NAME }} for secret interpolation. Project-level .continue/mcpServers/ directory auto-detects JSON configs from other tools (Claude Code, Cursor, etc.). See Continue MCP docs.
Grok CLI (AlphaOne fork) — Deep Integration:
| Scope | Config File | Applies to |
|---|---|---|
| User-level | ~/.grok/user-settings.json | All sessions |
// ~/.grok/user-settings.json
{
"mcp": {
"servers": [
{
"id": "ai-memory",
"label": "AI Memory",
"enabled": true,
"transport": "stdio",
"command": "ai-memory",
"args": ["mcp", "--tier", "semantic"]
}
]
}
}json
Deep integration features: Auto-recall on session start (injects relevant memories into system prompt), compaction summaries auto-stored as mid-tier memories, MCP tools available in all modes (agent, plan, ask), session-scoped connections (binary spawns once per session, not per message). Default tier semantic uses local MiniLM embeddings — no Ollama required. Install: curl -fsSL https://raw.githubusercontent.com/alphaonedev/grok-cli/main/install.sh | bash
xAI Grok API Configuration:
| Scope | Method | Applies to |
|---|---|---|
| Per-request | API tools array (no config file) | Each API call individually |
Remote HTTPS only (no stdio). Start ai-memory behind an HTTPS reverse proxy.
# Step 1: Start the ai-memory HTTP server
ai-memory serve --host 127.0.0.1 --port 9077 &
# Expose via HTTPS reverse proxy (nginx, caddy, cloudflare tunnel, etc.)
# Step 2: Add the MCP server to your Grok API call
curl https://api.x.ai/v1/responses \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-3",
"tools": [{
"type": "mcp",
"server_url": "https://your-server.example.com/mcp",
"server_label": "memory",
"server_description": "Persistent AI memory with recall and search"
}],
"input": "What do you remember about our project?"
}'bash
HTTPS required. server_label is required. Supports Streamable HTTP and SSE transports. Optional: allowed_tools, authorization, headers. Works with xAI SDK, OpenAI-compatible Responses API, and Voice Agent API. See xAI Remote MCP docs.
META Llama Stack Configuration:
| Scope | Method | Applies to |
|---|---|---|
| Declarative | run.yaml — tool_groups section | Deployment-wide (supports ${env.VAR}) |
| Programmatic | Python/Node SDK — toolgroups.register() | Runtime registration |
Llama Stack uses toolgroup registration with an HTTP backend.
# Step 1: Start the ai-memory HTTP server
ai-memory serve --host 127.0.0.1 --port 9077 &
# Step 2: Register as a Llama Stack toolgroup
# In your Llama Stack config, register the MCP endpoint:
# toolgroup: ai-memory
# provider: remote::mcp-endpoint
# url: http://127.0.0.1:9077
# Or use the REST API directly in custom tool definitions:
# POST /api/v1/memories, GET /api/v1/recall, etc.bash
META Llama uses Llama Stack for tool registration. Run ai-memory serve and register as a toolgroup via Python SDK or run.yaml (supports ${env.VAR_NAME} interpolation). Transport migrating from SSE to Streamable HTTP. See Llama Stack Tools docs.
OpenClaw Configuration:
| Scope | File | Applies to |
|---|---|---|
| Single config | Platform config file | All projects (single config file) |
Important: OpenClaw uses mcp.servers (NOT mcpServers). The key structure is different from most other platforms.
{
"mcp": {
"servers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
}
}
}
}json
CLI shortcut:
openclaw mcp set memory '{"command":"ai-memory","args":["--db","~/.local/share/ai-memory/memories.db","mcp","--tier","semantic"]}'bash
Management: openclaw mcp list · openclaw mcp show <name> · openclaw mcp unset <name>. See OpenClaw MCP docs.
Nous Research Hermes Agent Configuration:
| Scope | File | Applies to |
|---|---|---|
| Global only | ~/.hermes/config.yaml | All projects (no per-project scope) |
Important: Hermes uses YAML format with mcp_servers (underscored, NOT camelCase).
# Hermes Agent MCP configuration
# File: ~/.hermes/config.yaml
mcp_servers:
memory:
command: ai-memory
args:
- "--db"
- "~/.local/share/ai-memory/memories.db"
- "mcp"
- "--tier"
- "semantic"yaml
HTTP remote (alternative):
mcp_servers:
memory:
url: "http://localhost:9077/mcp"yaml
Supports both stdio (local) and HTTP (remote) transports. Per-server tool filtering via tools.include/tools.exclude. Additional fields: env, timeout, connect_timeout, enabled, sampling. See Hermes MCP docs.
Generic MCP Client Configuration:
| Transport | Method | Details |
|---|---|---|
| stdio | ai-memory mcp | JSON-RPC 2.0, spawned by AI client |
| HTTP | ai-memory serve | REST API on localhost:9077 |
Point your MCP client at the ai-memory binary with the mcp subcommand:
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "path/to/memory.db", "mcp", "--tier", "semantic"]
}
}
}json
The MCP server exposes 43 tools over stdio using JSON-RPC. Any client that speaks MCP will discover them automatically. Adjust the --db path to your preferred location.
Check that your AI has access to memory tools.
# MCP: Ask your AI "What memory tools do you have?"
# HTTP: curl http://127.0.0.1:9077/api/v1/health
# CLI: ai-memory statstext
Make Claude Code proactively use ai-memory in every conversation. Works with both Claude Code CLI and Claude Code Desktop.
Add to ~/.claude.json (user scope) so ai-memory is available in every project:
{
"mcpServers": {
"memory": {
"command": "ai-memory",
"args": ["--db", "~/.claude/ai-memory.db",
"mcp", "--tier", "semantic"]
}
}
}json
This gives Claude 43 native memory tools. No daemon, no ports.
Stop paying for 200+ lines of built-in memory context on every message:
// ~/.claude/settings.json
{
"autoMemoryEnabled": false
}json
ai-memory uses zero tokens until explicitly recalled — far more efficient than auto-memory.
Add a CLAUDE.md to your project root so Claude proactively uses ai-memory:
# Project — Claude Instructions
## AI Memory (MANDATORY)
Use `ai-memory` for persistent memory.
### On every conversation start:
1. Run ai-memory recall "<topic>"
2. Recall related memories for prior work
### While working:
- Store findings and decisions as they happen
- Use namespace `my-project`
### When finishing:
- Store a summary of what was donemarkdown
Without CLAUDE.md, Claude has the memory tools (via MCP) but may not use them unless explicitly asked. The CLAUDE.md file is read at the start of every conversation — it instructs Claude to recall context before starting work and store findings as it goes. This turns ai-memory from a passive tool into an active memory system that Claude uses autonomously.
<project>/CLAUDE.md | Project-wide (default) |
<subfolder>/CLAUDE.md | Subdirectory scope |
~/.claude/CLAUDE.md | User-global (all projects) |
Both Claude Code CLI and Claude Code Desktop read the same CLAUDE.md files and use the same MCP configuration. No separate setup needed — one configuration works for both.
Use MCP + CLAUDE.md together. MCP gives Claude the tools; CLAUDE.md tells Claude when and how to use them. Session hooks (hooks/session-start.sh) provide an optional third layer of auto-recall.
# Project — Claude Instructions
## AI Memory (MANDATORY)
Use `ai-memory` for persistent memory across conversations. This is NOT optional.
### On every conversation start:
1. Run `ai-memory recall "<topic>"` to check for relevant context before starting work
2. If the user references prior work, recall related memories first
### While working:
- Store important findings, decisions, and bug fixes as they happen — don't wait until the end
- Use namespace `my-project` for all project memories
- Default tier: `long`, default priority: `5` (use `9-10` for critical knowledge)
### When finishing work:
- Store a memory summarizing what was done, why, and any gotchas for next time
- Update existing memories if your work changes previously recorded facts
### Quick reference:
```bash
ai-memory recall "search query" # fuzzy search
ai-memory search "exact keywords" # precise match
ai-memory store -T "Title" -n my-project -t long -p 5 -c "content" # store
ai-memory update <id> -c "new content" # update
ai-memory list -n my-project # browse
```markdown
Every capability at a glance. 4 feature tiers (keyword to autonomous), 43 MCP tools, four operational modes (MCP, HTTP, curator, sync), one shared SQLite database. Works with any AI that supports MCP or HTTP. Canonical claims →
Built-in memory systems (Claude auto-memory, ChatGPT memory) load your entire memory into every conversation -- burning tokens and money on every message. ai-memory uses zero context tokens until recalled. Only relevant memories come back, ranked by score. Replace auto-memory and stop paying for 200+ lines of idle context.
Save memories with a title, content, tier, tags, and priority. Recall them later with fuzzy search that ranks results by 6 factors including recency decay.
Short (6h), mid (7d), and long (permanent). Memories auto-promote to long-term after 5 accesses. TTL extends on every recall.
SQLite FTS5 for keyword search plus vector embeddings for semantic similarity. Hybrid recall blends both FTS5 and cosine similarity for best-of-both-worlds relevance.
Scale from zero-dependency keyword search to full autonomous memory management. Each tier adds capabilities: keyword, semantic, smart, and autonomous.
Connect memories with typed relations: related_to, supersedes, contradicts, derived_from. Resolve contradictions with a single command.
Smart and autonomous tiers use Ollama (Gemma 4) for query expansion, auto-tagging, auto-consolidation, cross-encoder reranking, and contradiction analysis.
Token-Oriented Object Notation eliminates repeated field names in recall responses. Pass format: "toon" for 61% fewer bytes or "toon_compact" for 79% fewer. Field names declared once as a header, values as pipe-delimited rows. LLMs parse it natively.
Two MCP prompts teach AI clients to use memory proactively. recall-first: 9 behavioral rules (recall at start, store corrections, TOON format, tier strategy, dedup). memory-workflow: quick reference card for all tool patterns. AI clients receive these at connection time via prompts/list.
Each tier builds on the one below it. Choose based on your resources and needs. Set via ai-memory mcp --tier <name> or in ~/.config/ai-memory/config.toml.
| Tier | RAM | Embedding Model | LLM | Dependencies | Key Features |
|---|---|---|---|---|---|
| keyword | 0 MB | — | — | None | FTS5 full-text search, keyword tier (subset of 43 tools) |
| semantic | ~256 MB | all-MiniLM-L6-v2 (384-dim, local via Candle) | — | None (model auto-downloads ~90MB) | + Hybrid recall (FTS5 + cosine similarity), HNSW vector index, semantic tier (subset of 43 tools) |
| smart | ~1 GB | nomic-embed-text-v1.5 (768-dim, via Ollama) | Gemma 4 E2B (~1GB) | Ollama | + LLM query expansion, auto-tagging, auto-consolidation, 43 MCP tools (full surface) |
| autonomous | ~4 GB | nomic-embed-text-v1.5 (768-dim, via Ollama) | Gemma 4 E4B (~2.3GB) | Ollama | + Neural cross-encoder reranking (ms-marco-MiniLM), contradiction analysis, 43 MCP tools (full surface) |
Pure SQLite FTS5 full-text search. Zero ML dependencies, zero memory overhead. The binary is entirely self-contained. Ideal for low-resource environments, CI runners, or when you just need fast text matching.
Adds dense vector embeddings via all-MiniLM-L6-v2 (384-dim), loaded locally through the Candle ML framework. Recall blends FTS5 keyword scores with cosine similarity using adaptive content-length weighting (50/50 for short memories, 85/15 FTS-weighted for long content). HNSW index for fast approximate nearest-neighbor search. The model auto-downloads from HuggingFace on first run (~90MB).
Upgrades to nomic-embed-text-v1.5 (768-dim) via Ollama for higher-quality embeddings. Adds an on-device LLM (Gemma 4 Effective 2B) that powers three new tools: memory_expand_query (semantic query broadening), memory_auto_tag (content-aware tagging), and memory_detect_contradiction (conflict detection). Requires Ollama running locally.
Upgrades the LLM to Gemma 4 Effective 4B for more nuanced reasoning. Adds a neural cross-encoder reranker (ms-marco-MiniLM-L-6-v2) that re-scores (query, document) pairs after hybrid retrieval for significantly better recall precision. Full autonomous memory reflection and contradiction resolution. Requires Ollama.
Every capability mapped to its minimum tier. Each tier includes all capabilities from the tiers below it.
| Capability | keyword | semantic | smart | autonomous |
|---|---|---|---|---|
| Search & Recall | ||||
FTS5 keyword search (memory_search) | Yes | Yes | Yes | Yes |
| Semantic embedding (cosine similarity) | — | Yes | Yes | Yes |
| Hybrid recall (FTS5 + cosine, adaptive blend) | — | Yes | Yes | Yes |
| HNSW approximate nearest-neighbor index | — | Yes | Yes | Yes |
LLM query expansion (memory_expand_query) | — | — | Yes | Yes |
| Neural cross-encoder reranking (ms-marco-MiniLM) | — | — | — | Yes |
| Memory Management | ||||
| Store, update, delete, promote | Yes | Yes | Yes | Yes |
| Link memories (4 relation types) | Yes | Yes | Yes | Yes |
| Bulk forget by pattern/namespace/tier | Yes | Yes | Yes | Yes |
| Manual consolidation (user-provided summary) | Yes | Yes | Yes | Yes |
| Auto-consolidation (LLM-generated summary) | — | — | Yes | Yes |
Auto-tagging (memory_auto_tag) | — | — | Yes | Yes |
Contradiction detection (memory_detect_contradiction) | — | — | Yes | Yes |
| Autonomous memory reflection | — | — | — | Yes |
| Embedding Model | ||||
| Model | — | all-MiniLM-L6-v2 | nomic-embed-text-v1.5 | nomic-embed-text-v1.5 |
| Dimensions | — | 384 | 768 | 768 |
| Runtime | — | Candle (local) | Ollama | Ollama |
| Model size | — | ~90 MB | ~274 MB | ~274 MB |
| LLM (Language Model) | ||||
| Model | — | — | Gemma 4 Effective 2B | Gemma 4 Effective 4B |
| Ollama tag | — | — | gemma4:e2b | gemma4:e4b |
| Model size | — | — | ~7.2 GB | ~9.6 GB |
| Resources | ||||
| Total RAM | 0 MB | ~256 MB | ~1 GB | ~4 GB |
| External dependencies | None | None | Ollama | Ollama |
| MCP tools exposed | keyword subset | semantic subset | smart subset | full 43-tool surface |
| Ollama models to pull | — | — | nomic-embed-text + gemma4:e2b | nomic-embed-text + gemma4:e4b |
Tiers gate features, not models. The --tier flag controls which tools are exposed. The LLM model is independently configurable via llm_model in config.toml.
For example, run autonomous tier (all features) with the faster e2b model: llm_model = "gemma4:e2b" (46 tok/s vs 26 tok/s for e4b).
If Ollama is unavailable at startup, smart and autonomous tiers fall back to semantic automatically.
# ~/.config/ai-memory/config.toml
# Created automatically on first run with defaults commented out
tier = "autonomous" # keyword | semantic | smart | autonomous
db = "~/.claude/ai-memory.db" # SQLite database path
ollama_url = "http://localhost:11434" # Ollama API endpoint
llm_model = "gemma4:e2b" # independently configurable (e2b=46tok/s, e4b=26tok/s)
cross_encoder = true # Neural reranking (autonomous tier)
default_namespace = "global" # Default namespace for new memoriestoml
ai-memory runs as a Model Context Protocol (MCP) tool server over stdio. Any MCP-compatible AI client -- Claude, ChatGPT, Grok, Llama, or custom agents -- discovers these tools automatically.
Store a new memory. Deduplicates by title+namespace. Detects contradictions with existing memories.
Fuzzy OR search with 6-factor ranking. Auto-touches recalled memories (extends TTL, may promote).
Exact keyword AND search. Returns memories matching all terms.
Browse memories with filters: namespace, tier, tags, date range.
Retrieve a single memory by ID, including all its links.
Update an existing memory: change title, content, tier, priority, or tags.
Delete a specific memory by ID. Links cascade automatically.
Promote a memory to long-term permanent storage. Clears expiry.
Bulk delete by pattern, namespace, or tier.
Link two memories: related_to, supersedes, contradicts, or derived_from.
Get all links for a memory by ID.
Merge multiple memories into one long-term summary.
Database statistics: counts by tier, namespaces, link count, DB size.
Returns available capabilities for the current feature tier. Lets the AI discover what tools and features are active.
LLM-powered query expansion. Broadens a recall query with synonyms and related terms for better recall coverage. (smart+ tiers)
LLM-powered auto-tagging. Analyzes memory content and suggests relevant tags automatically. (smart+ tiers)
LLM-powered contradiction analysis. Compares a memory against existing memories to detect conflicts and inconsistencies. (smart+ tiers)
Start with ai-memory serve (default: http://127.0.0.1:9077).
The HTTP API works with any AI platform, any programming language, any framework. If it can make an HTTP request, it can use ai-memory.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Deep health check (DB + FTS5 integrity) |
| GET | /memories | List memories (filter: namespace, tier, priority, date range, tags) |
| POST | /memories | Create memory (dedup on title+namespace, contradiction detection) |
| POST | /memories/bulk | Bulk create (up to 1000 items per request) |
| GET | /memories/{id} | Get memory by ID (includes links) |
| PUT | /memories/{id} | Update memory (partial update, validated) |
| DELETE | /memories/{id} | Delete memory (links cascade) |
| POST | /memories/{id}/promote | Promote memory to long-term (clears expiry) |
| GET | /search | FTS5 AND search with 6-factor ranking |
| GET | /recall | Fuzzy OR recall + touch + auto-promote |
| POST | /recall | Recall via POST body (for longer queries) |
| POST | /forget | Bulk delete by pattern/namespace/tier |
| POST | /consolidate | Merge 2-100 memories into one long-term summary |
| POST | /links | Create memory link (4 relation types) |
| GET | /links/{id} | Get all links for a memory |
| GET | /namespaces | List namespaces with counts |
| GET | /stats | Aggregate statistics |
| POST | /gc | Run garbage collection on expired memories |
| GET | /export | Export all memories + links as JSON |
| POST | /import | Import memories + links from JSON |
# Python (works with any AI backend: OpenAI, Anthropic, local Llama, etc.)
import requests
def ai_store_memory(title, content, tier="mid"):
requests.post("http://127.0.0.1:9077/api/v1/memories", json={
"title": title, "content": content, "tier": tier
})
def ai_recall(context):
r = requests.get("http://127.0.0.1:9077/api/v1/recall", params={"context": context})
return r.json()
# Use in your AI's tool/function definitions
# Works with OpenAI function calling, Anthropic tool use, etc.python
Global flags: --db <path> and --json.
Scriptable, pipeable, works in any shell. Use directly or wrap in your AI's tool layer.
| Category | Command | Description |
|---|---|---|
| Server | mcp | Run as MCP tool server over stdio (primary integration for MCP clients) |
| Server | serve | Start HTTP daemon (--host, --port, default 9077) -- universal API for any AI |
| Core | store | Store memory (-T title, -c content, --tier, --namespace, --tags, --priority, --confidence, --source) |
| Core | update | Update memory by ID (partial fields) |
| Core | delete | Delete memory by ID (links cascade) |
| Core | promote | Promote to long-term (clears expiry) |
| Query | recall | Fuzzy OR recall with 6-factor ranking (--namespace, --limit, --tags, --since) |
| Query | search | AND keyword search (--namespace, --tier, --limit, --since, --until, --tags) |
| Query | get | Get memory by ID (includes links) |
| Query | list | List with filters (--namespace, --tier, --limit, --since, --until, --tags) |
| Manage | forget | Bulk delete (--namespace, --pattern, --tier) |
| Manage | link | Link two memories (--relation: related_to, supersedes, contradicts, derived_from) |
| Manage | consolidate | Merge N memories into one (-T title, -s summary, --namespace) |
| Manage | resolve | Resolve contradiction: winner supersedes loser (demotes loser: priority=1, confidence=0.1) |
| Manage | auto-consolidate | Auto-group by namespace+tag and consolidate (--dry-run, --short-only, --min-count, --namespace) |
| Ops | gc | Run garbage collection on expired memories |
| Ops | stats | Show statistics (counts, tiers, namespaces, links, DB size) |
| Ops | namespaces | List all namespaces with memory counts |
| Ops | sync | Sync databases (--direction pull|push|merge, dedup-safe upsert) |
| I/O | export | Export all memories + links as JSON (stdout) |
| I/O | import | Import memories + links from JSON (stdin) |
| I/O | completions | Generate shell completions (bash, zsh, fish) |
| I/O | man | Generate roff man page to stdout |
| I/O | mine | Import memories from historical conversations (Claude, ChatGPT, Slack) |
| Ops | shell | Interactive REPL with color output (recall, search, list, get, stats, namespaces, delete) |
Memories are organized into three tiers that mirror human memory systems. Each tier has automatic TTL management, and memories flow upward through access patterns.
Ephemeral context. Current task state, debugging notes, transient observations.
Extends +1h on each recall. Good for "what am I working on right now" context.
Working knowledge. Sprint goals, recent decisions, active project context.
Extends +1d on recall. Auto-promotes to long-term at 5 accesses.
Permanent. Architecture, user preferences, hard-won lessons, corrections.
Never expires. Highest tier boost (3.0) in recall ranking. The knowledge bedrock.
Every recall query computes a composite score entirely in SQLite. Higher scores rank first. No external ML or embedding service required.
Defense in depth, even for a local tool. Every input is validated, every error is sanitized, every write is transactional.
Every write operation is wrapped in a SQLite transaction. WAL mode enables concurrent reads without blocking. Schema migrations are atomic.
Search queries are sanitized before reaching FTS5. All special characters including | (pipe/OR operator), ", *, ^, :, -, braces, and parentheses are stripped. Boolean operators (AND, OR, NOT, NEAR) are filtered as standalone tokens. Every term is double-quoted.
HTTP request bodies are capped at 50MB via DefaultBodyLimit. Prevents memory exhaustion from oversized payloads at the transport layer.
The HTTP server applies CorsLayer::permissive() -- open CORS policy appropriate for localhost-bound services. Safe because the server defaults to 127.0.0.1 binding.
Error messages never leak database internals, file paths, or stack traces. Handlers return generic "internal server error" strings; details go to tracing::error! only.
Bulk create and import operations cap at 1000 items per request (MAX_BULK_SIZE). Prevents memory exhaustion and denial-of-service from oversized batches.
Color output uses AtomicBool with atomic ordering for thread-safe global state. No mutexes needed for the color-enabled flag across threads.
During database sync (pull, push, merge), every imported link is validated via validate::validate_link() before insertion. Invalid links are silently skipped to prevent corrupt cross-references.
The MCP server validates that every incoming request has jsonrpc: "2.0". Non-conformant requests are rejected before any tool dispatch occurs.
MCP tool calls extract arguments from the request params object. Non-object arguments default to an empty object, preventing type-confusion attacks on tool handlers.
Shared validation layer across CLI, HTTP, and MCP. Title max 512B, content max 64KB, namespace alphanumeric, source whitelisted, priority 1-10, confidence 0.0-1.0.
The HTTP server binds to 127.0.0.1 by default. Your memories never leave your machine unless you explicitly configure otherwise.
Single Rust binary. Three universal interfaces. Four feature tiers with optional local LLMs via Ollama.
All three interfaces are universal -- any AI platform can use any of them. They share the same validation layer and database.
| Capability | CLI (Universal) | HTTP API (Universal) | MCP (Universal) |
|---|---|---|---|
| Store memory | Yes | Yes | Yes |
| Update memory | Yes | Yes | Yes |
| Recall (fuzzy OR) | Yes | Yes | Yes |
| Search (AND) | Yes | Yes | Yes |
| Get by ID | Yes | Yes | Yes |
| List with filters | Yes | Yes | Yes |
| Delete | Yes | Yes | Yes |
| Promote | Yes | Yes | Yes |
| Forget (bulk delete) | Yes | Yes | Yes |
| Link memories | Yes | Yes | Yes |
| Get links | Yes | Yes | Yes |
| Consolidate | Yes | Yes | Yes |
| Stats | Yes | Yes | Yes |
| Bulk create | -- | Yes | -- |
| Resolve contradictions | Yes | -- | -- |
| Auto-consolidate | Yes | -- | -- |
| Sync databases | Yes | -- | -- |
| Interactive shell | Yes | -- | -- |
| Export / Import | Yes | Yes | -- |
| Garbage collection | Yes | Yes | -- |
| Namespaces list | Yes | Yes | -- |
| Shell completions | Yes | -- | -- |
| Man page | Yes | -- | -- |
ai-memory shell opens a REPL with color-coded output. Tiers are red/yellow/green, priority is visualized as bars, namespaces appear in cyan.
All interfaces work with any AI platform. Choose the one that fits your setup.
# Store a memory
ai-memory store -T "Project uses Rust 2021 edition" \
-c "Rust 2021, Axum for HTTP, SQLite for storage." \
--tier long --priority 7
# Recall relevant memories
ai-memory recall "what language and framework"
# Exact keyword search
ai-memory search "Axum"
# List all, JSON output
ai-memory list --jsonbash
# Start the daemon
ai-memory serve &
# Store via API (works from any language, any AI backend)
curl -X POST http://127.0.0.1:9077/api/v1/memories \
-H 'Content-Type: application/json' \
-d '{"title":"Test","content":"It works.","tier":"short"}'
# Recall
curl "http://127.0.0.1:9077/api/v1/recall?context=test"bash
GitHub Actions runs on every push and PR. Releases are automated on tag push with cross-platform binaries.
ICLR 2025 dataset, 500 questions, 6 categories
| Config | R@1 | R@5 | R@10 | R@20 | Time | Speed |
|---|---|---|---|---|---|---|
| Parallel FTS5 (keyword) | 86.2% | 97.0% | 98.2% | 99.4% | 2.2s | 232 q/s |
| LLM-expanded + parallel FTS5 | 86.8% | 97.8% | 99.0% | 99.8% | 3.5s | 142 q/s |
| Category | R@1 | R@5 | R@10 | R@20 |
|---|---|---|---|---|
| single-session-assistant | 100.0% | 100.0% | 100.0% | 100.0% |
| knowledge-update | 91.0% | 100.0% | 100.0% | 100.0% |
| single-session-user | 88.6% | 98.6% | 100.0% | 100.0% |
| multi-session | 88.0% | 97.7% | 98.5% | 100.0% |
| temporal-reasoning | 79.7% | 96.2% | 98.5% | 99.2% |
| single-session-preference | 73.3% | 93.3% | 96.7% | 100.0% |
| OVERALL | 86.8% | 97.8% | 99.0% | 99.8% |
# 1. Clone dataset
git clone --depth 1 https://github.com/xiaowu0162/LongMemEval /tmp/LongMemEval
cd /tmp/LongMemEval/data
curl -sLO https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
cd -
# 2. Install
cargo install --git https://github.com/alphaonedev/ai-memory-mcp.git
pip install tabulate requests
# 3. Run (keyword -- 2.2s)
python3 benchmarks/longmemeval/harness_99.py --dataset-path /tmp/LongMemEval --variant S --no-expand --workers 10
# 4. Run (LLM-expanded -- benchmark was last run with Ollama gemma3:4b; Gemma 4 refresh pending)
python3 benchmarks/longmemeval/harness_99.py --dataset-path /tmp/LongMemEval --variant S --workers 10bash