Persistent Memory for Any AI

Zero token cost until recalled. Built-in memory systems load your entire memory into every message. ai-memory uses zero context tokens until the AI calls memory_recall — only relevant memories come back, ranked and compressed via TOON format (79% smaller than JSON).

Works with Claude · ChatGPT · Grok · Cursor · Windsurf · Continue.dev · OpenClaw · Llama · any MCP client

97.8%Recall@5
17MCP Tools
4Feature Tiers
161Tests
Get Started in 60 Seconds

LongMemEval Benchmark (ICLR 2025) — 500 questions, 6 categories

97.8% R@5 (489/500)
99.0% R@10 (495/500)
99.8% R@20 (499/500)
2.2s 232 q/s (keyword)
$0 Cloud API costs

Pure SQLite FTS5 + BM25 — zero cloud dependencies — full benchmark details & replication steps

Works With Any AI Platform

MCP is the universal integration layer. The HTTP API works with literally anything that can make a request. No vendor lock-in.

C

Claude Code

Anthropic's Claude Code, Claude Desktop, and any Claude-based tool

MCP Native
O

OpenAI Codex CLI

OpenAI's Codex command-line agent with TOML-based MCP config

MCP Native
G

Google Gemini CLI

Google's Gemini CLI with JSON-based MCP server configuration

MCP Native
Cu

Cursor IDE

AI-powered code editor with built-in MCP support

MCP Native
W

Windsurf

Codeium's AI IDE with MCP tool integration

MCP Native
Co

Continue.dev

Open-source AI code assistant with YAML-based MCP config

MCP Native
X

xAI Grok

Grok and any xAI-based applications via remote MCP

Remote MCP (HTTPS)
L

META Llama

Llama Stack toolgroup registration via HTTP server

HTTP / MCP
๐Ÿฆž

OpenClaw

Self-hosted AI assistant with MCP via mcp.servers config

MCP Native
*

Any MCP Client

Any tool that speaks the Model Context Protocol -- present or future

Universal

MCP = native tool integration (stdio JSON-RPC)  |  HTTP = REST API on localhost:9077 (works with anything)  |  CLI = shell commands (scriptable, pipeable)

Install

One command. No dependencies for pre-built binaries. Three installation methods.

Recommended

macOS / Linux

Pre-built binary. Auto-detects OS & architecture.

curl -fsSL https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.sh | sh

Windows

PowerShell installer. Adds to PATH automatically.

irm https://raw.githubusercontent.com/alphaonedev/ai-memory-mcp/main/install.ps1 | iex

Cargo (crates.io)

From source. Needs Rust + C compiler.

cargo install ai-memory

Docker

Containerized HTTP server on port 9077.

docker build -t ai-memory .
docker run -p 9077:9077 -v data:/data ai-memory

cargo-binstall

Pre-built binary via cargo. No compile step.

cargo binstall ai-memory

Supported platforms: macOS (Intel + Apple Silicon)  •  Linux (x86_64 + ARM64)  •  Windows (x86_64)  •  WSL  •  Docker

Build from source? Ubuntu/Debian: sudo apt install build-essential pkg-config  •  Fedora/RHEL: sudo dnf install gcc pkg-config  •  macOS: Xcode CLT (pre-installed)  •  Windows: MSVC C++ build tools

  1. Optional: Ollama for Smart & Autonomous tiers Optional

    The keyword and semantic tiers work with zero dependencies. The smart and autonomous tiers add LLM-powered query expansion, auto-tagging, and neural reranking via Ollama.

  2. Install Ollama Smart & Autonomous Tiers

    The smart and autonomous tiers use local LLMs via Ollama for query expansion, auto-tagging, contradiction detection, and cross-encoder reranking. Skip this step if you only need keyword or semantic search.

    macOS

    # Install via Homebrew
    brew install ollama
    
    # Or download the macOS app:
    # https://ollama.com/download/mac
    
    # Start the Ollama service
    ollama serve &
    # (or launch the Ollama.app -- it runs as a menu bar item)
    
    # Pull models for your tier
    ollama pull nomic-embed-text  # Embeddings (smart+)
    ollama pull gemma4:e2b         # LLM โ€” Smart (~1GB)
    ollama pull gemma4:e4b         # LLM โ€” Autonomous (~2.3GB)

    Linux

    # One-line install script
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Enable and start the systemd service
    sudo systemctl enable ollama
    sudo systemctl start ollama
    
    # Pull models for your tier
    ollama pull nomic-embed-text  # Embeddings (smart+)
    ollama pull gemma4:e2b         # LLM โ€” Smart (~1GB)
    ollama pull gemma4:e4b         # LLM โ€” Autonomous (~2.3GB)

    Windows

    # Install via winget
    winget install Ollama.Ollama
    
    # Or download the installer:
    # https://ollama.com/download/windows
    
    # Ollama runs as a system service after install
    
    # Pull models for your tier
    ollama pull nomic-embed-text  # Embeddings (smart+)
    ollama pull gemma4:e2b         # LLM โ€” Smart (~1GB)
    ollama pull gemma4:e4b         # LLM โ€” Autonomous (~2.3GB)

    Verify Ollama

    # Check Ollama is running and models are available
    curl http://localhost:11434/api/tags
    ollama run gemma4:e2b "Hello, world"   # Should respond in ~1s

    ai-memory connects to Ollama at localhost:11434 automatically. Override with ollama_url in ~/.config/ai-memory/config.toml or --ollama-url flag. If Ollama is unavailable, ai-memory gracefully falls back to the semantic tier.

  3. Configure your AI platform

    Choose the integration method that fits your setup.

    Claude Code Codex CLI Gemini CLI Cursor Windsurf Continue.dev Grok Llama OpenClaw Any MCP Client

    Claude Code MCP Configuration Scopes:

    ScopeFileApplies to
    User (global)~/.claude.jsonAll projects on your machine
    Project (shared).mcp.json in project rootEveryone on the project (via git)
    Local (private)~/.claude.json under projectsOne project, just you

    User scope (recommended) โ€” merge mcpServers into your existing ~/.claude.json (macOS/Linux) or %USERPROFILE%\.claude.json (Windows):

    {
      "mcpServers": {
        "memory": {
          "command": "ai-memory",
          "args": ["--db", "~/.claude/ai-memory.db", "mcp", "--tier", "semantic"]
        }
      }
    }json

    Restart Claude Code. It will discover all 17 memory tools natively. No daemon, no ports. MCP servers do not go in settings.json or settings.local.json. The --tier flag is required โ€” options: keyword, semantic (default), smart, autonomous. Smart/autonomous require Ollama.

    Windows: Use ai-memory.exe for the command and forward slashes in paths: "C:/Users/YourName/.claude/ai-memory.db"

    OpenAI Codex CLI Configuration Scopes:

    ScopeFileApplies to
    Global (user)~/.codex/config.tomlAll projects on your machine
    Project.codex/config.toml in project rootTrusted projects only

    Windows: %USERPROFILE%\.codex\config.toml. Override config dir with CODEX_HOME env var.

    # OpenAI Codex CLI MCP configuration
    [mcp_servers.memory]
    command = "ai-memory"
    args = ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
    enabled = truetoml

    CLI shortcut: codex mcp add memory -- ai-memory --db ~/.local/share/ai-memory/memories.db mcp --tier semantic

    Codex uses TOML with underscored key mcp_servers (not camelCase). Supports env, env_vars, enabled_tools, disabled_tools, startup_timeout_sec, tool_timeout_sec. Use /mcp in the TUI to view server status. Windows/WSL: WSL uses Linux home by default โ€” set CODEX_HOME to share config with Windows host. See Codex MCP docs.

    Google Gemini CLI Configuration Scopes:

    ScopeFileApplies to
    User (global)~/.gemini/settings.jsonAll projects on your machine
    Project.gemini/settings.json in project rootScoped to that project

    Windows: %USERPROFILE%\.gemini\settings.json. Env vars: $VAR / ${VAR} (all platforms), %VAR% (Windows).

    {
      "mcpServers": {
        "memory": {
          "command": "ai-memory",
          "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"],
          "timeout": 30000
        }
      }
    }json

    CLI shortcut: gemini mcp add memory ai-memory -- --db ~/.local/share/ai-memory/memories.db mcp --tier semantic

    Avoid underscores in server names (use hyphens). Tool names are auto-prefixed as mcp_memory_<toolName>. Env vars in env field support $VAR / ${VAR} (all platforms) and %VAR% (Windows). Gemini sanitizes sensitive patterns (*TOKEN*, *SECRET*) from inherited env unless declared. Add "trust": true to skip confirmation. CLI: gemini mcp list/remove/enable/disable. See Gemini CLI MCP docs.

    Cursor IDE Configuration Scopes:

    ScopeFileApplies to
    Global (user)~/.cursor/mcp.jsonAll projects on your machine
    Project.cursor/mcp.json in project rootOverrides global for same-named servers

    Windows: %USERPROFILE%\.cursor\mcp.json. Also configurable via Settings > Tools & MCP.

    {
      "mcpServers": {
        "memory": {
          "command": "ai-memory",
          "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
        }
      }
    }json

    Or add via Cursor Settings > Tools & MCP. Restart Cursor after editing. Verify with green dot in Settings. Supports env, envFile, ${env:VAR_NAME} interpolation (can be unreliable for shell profile vars โ€” use envFile as workaround). ~40 tool limit across all servers. See Cursor MCP docs.

    Windsurf (Codeium) Configuration Scopes:

    ScopeFileApplies to
    Global only~/.codeium/windsurf/mcp_config.jsonAll projects (no project scope)

    Windows: %USERPROFILE%\.codeium\windsurf\mcp_config.json. Also configurable via MCP Marketplace or Settings > Cascade > MCP Servers.

    {
      "mcpServers": {
        "memory": {
          "command": "ai-memory",
          "args": ["--db", "~/.codeium/windsurf/ai-memory.db", "mcp", "--tier", "semantic"]
        }
      }
    }json

    Supports ${env:VAR_NAME} interpolation in command, args, env, serverUrl, url, and headers. 100 tool limit across all servers. Can also add via MCP Marketplace or Settings > Cascade > MCP Servers. See Windsurf MCP docs.

    Continue.dev Configuration Scopes:

    ScopeFileApplies to
    User (global)~/.continue/config.yamlAll projects on your machine
    Project.continue/mcpServers/ dir in project rootPer-server YAML/JSON files

    Windows: %USERPROFILE%\.continue\config.yaml. Project dir auto-detects JSON configs from other tools.

    # Continue.dev MCP configuration
    mcpServers:
      - name: memory
        command: ai-memory
        args:
          - "--db"
          - "~/.continue/ai-memory.db"
          - "mcp"
          - "--tier"
          - "semantic"yaml

    MCP tools only work in agent mode. Supports ${{ secrets.SECRET_NAME }} for secret interpolation. Project-level .continue/mcpServers/ directory auto-detects JSON configs from other tools (Claude Code, Cursor, etc.). See Continue MCP docs.

    xAI Grok Configuration:

    ScopeMethodApplies to
    Per-requestAPI tools array (no config file)Each API call individually

    Remote HTTPS only (no stdio). Start ai-memory behind an HTTPS reverse proxy.

    # Step 1: Start the ai-memory HTTP server
    ai-memory serve --host 127.0.0.1 --port 9077 &
    # Expose via HTTPS reverse proxy (nginx, caddy, cloudflare tunnel, etc.)
    
    # Step 2: Add the MCP server to your Grok API call
    curl https://api.x.ai/v1/responses \
      -H "Authorization: Bearer $XAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "grok-3",
        "tools": [{
          "type": "mcp",
          "server_url": "https://your-server.example.com/mcp",
          "server_label": "memory",
          "server_description": "Persistent AI memory with recall and search"
        }],
        "input": "What do you remember about our project?"
      }'bash

    HTTPS required. server_label is required. Supports Streamable HTTP and SSE transports. Optional: allowed_tools, authorization, headers. Works with xAI SDK, OpenAI-compatible Responses API, and Voice Agent API. See xAI Remote MCP docs.

    META Llama Stack Configuration:

    ScopeMethodApplies to
    Declarativerun.yaml โ€” tool_groups sectionDeployment-wide (supports ${env.VAR})
    ProgrammaticPython/Node SDK โ€” toolgroups.register()Runtime registration

    Llama Stack uses toolgroup registration with an HTTP backend.

    # Step 1: Start the ai-memory HTTP server
    ai-memory serve --host 127.0.0.1 --port 9077 &
    
    # Step 2: Register as a Llama Stack toolgroup
    # In your Llama Stack config, register the MCP endpoint:
    #   toolgroup: ai-memory
    #   provider: remote::mcp-endpoint
    #   url: http://127.0.0.1:9077
    
    # Or use the REST API directly in custom tool definitions:
    #   POST /api/v1/memories, GET /api/v1/recall, etc.bash

    META Llama uses Llama Stack for tool registration. Run ai-memory serve and register as a toolgroup via Python SDK or run.yaml (supports ${env.VAR_NAME} interpolation). Transport migrating from SSE to Streamable HTTP. See Llama Stack Tools docs.

    OpenClaw Configuration:

    ScopeFileApplies to
    Single configPlatform config fileAll projects (single config file)

    Important: OpenClaw uses mcp.servers (NOT mcpServers). The key structure is different from most other platforms.

    {
      "mcp": {
        "servers": {
          "memory": {
            "command": "ai-memory",
            "args": ["--db", "~/.local/share/ai-memory/memories.db", "mcp", "--tier", "semantic"]
          }
        }
      }
    }json

    CLI shortcut:

    openclaw mcp set memory '{"command":"ai-memory","args":["--db","~/.local/share/ai-memory/memories.db","mcp","--tier","semantic"]}'bash

    Management: openclaw mcp list ยท openclaw mcp show <name> ยท openclaw mcp unset <name>. See OpenClaw MCP docs.

    Generic MCP Client Configuration:

    TransportMethodDetails
    stdioai-memory mcpJSON-RPC 2.0, spawned by AI client
    HTTPai-memory serveREST API on localhost:9077

    Point your MCP client at the ai-memory binary with the mcp subcommand:

    {
      "mcpServers": {
        "memory": {
          "command": "ai-memory",
          "args": ["--db", "path/to/memory.db", "mcp", "--tier", "semantic"]
        }
      }
    }json

    The MCP server exposes 17 tools over stdio using JSON-RPC. Any client that speaks MCP will discover them automatically. Adjust the --db path to your preferred location.

  4. Verify it works

    Check that your AI has access to memory tools.

    # MCP: Ask your AI "What memory tools do you have?"
    # HTTP: curl http://127.0.0.1:9077/api/v1/health
    # CLI:  ai-memory statstext

What It Does

Every capability at a glance. 4 feature tiers (keyword to autonomous), 17 MCP tools, three interfaces, one shared database. Works with any AI that supports MCP or HTTP.

$0

Zero Token Cost

Built-in memory systems (Claude auto-memory, ChatGPT memory) load your entire memory into every conversation -- burning tokens and money on every message. ai-memory uses zero context tokens until recalled. Only relevant memories come back, ranked by score. Replace auto-memory and stop paying for 200+ lines of idle context.

S

Store and Recall

Save memories with a title, content, tier, tags, and priority. Recall them later with fuzzy search that ranks results by 6 factors including recency decay.

3T

Three-Tier Memory

Short (6h), mid (7d), and long (permanent). Memories auto-promote to long-term after 5 accesses. TTL extends on every recall.

FTS

Full-Text + Semantic Search

SQLite FTS5 for keyword search plus vector embeddings for semantic similarity. Hybrid recall blends both FTS5 and cosine similarity for best-of-both-worlds relevance.

4T

4 Feature Tiers

Scale from zero-dependency keyword search to full autonomous memory management. Each tier adds capabilities: keyword, semantic, smart, and autonomous.

L

Memory Links

Connect memories with typed relations: related_to, supersedes, contradicts, derived_from. Resolve contradictions with a single command.

LLM

LLM-Powered Features

Smart and autonomous tiers use Ollama (Gemma 4) for query expansion, auto-tagging, auto-consolidation, cross-encoder reranking, and contradiction analysis.

T

TOON Format

Token-Oriented Object Notation eliminates repeated field names in recall responses. Pass format: "toon" for 61% fewer bytes or "toon_compact" for 79% fewer. Field names declared once as a header, values as pipe-delimited rows. LLMs parse it natively.

P

MCP Prompts

Two MCP prompts teach AI clients to use memory proactively. recall-first: 9 behavioral rules (recall at start, store corrections, TOON format, tier strategy, dedup). memory-workflow: quick reference card for all tool patterns. AI clients receive these at connection time via prompts/list.

Feature Tiers 4 Levels

Each tier builds on the one below it. Choose based on your resources and needs. Set via ai-memory mcp --tier <name> or in ~/.config/ai-memory/config.toml.

TierRAMEmbedding ModelLLMDependenciesKey Features
keyword 0 MB None FTS5 full-text search, 13 MCP tools
semantic ~256 MB all-MiniLM-L6-v2 (384-dim, local via Candle) None (model auto-downloads ~90MB) + Hybrid recall (FTS5 + cosine similarity), HNSW vector index, 14 MCP tools
smart ~1 GB nomic-embed-text-v1.5 (768-dim, via Ollama) Gemma 4 E2B (~1GB) Ollama + LLM query expansion, auto-tagging, auto-consolidation, 17 MCP tools
autonomous ~4 GB nomic-embed-text-v1.5 (768-dim, via Ollama) Gemma 4 E4B (~2.3GB) Ollama + Neural cross-encoder reranking (ms-marco-MiniLM), contradiction analysis, 17 MCP tools

Keyword Tier

Pure SQLite FTS5 full-text search. Zero ML dependencies, zero memory overhead. The binary is entirely self-contained. Ideal for low-resource environments, CI runners, or when you just need fast text matching.

Semantic Tier (default)

Adds dense vector embeddings via all-MiniLM-L6-v2 (384-dim), loaded locally through the Candle ML framework. Recall blends FTS5 keyword scores with cosine similarity using adaptive content-length weighting (50/50 for short memories, 85/15 FTS-weighted for long content). HNSW index for fast approximate nearest-neighbor search. The model auto-downloads from HuggingFace on first run (~90MB).

Smart Tier

Upgrades to nomic-embed-text-v1.5 (768-dim) via Ollama for higher-quality embeddings. Adds an on-device LLM (Gemma 4 Effective 2B) that powers three new tools: memory_expand_query (semantic query broadening), memory_auto_tag (content-aware tagging), and memory_detect_contradiction (conflict detection). Requires Ollama running locally.

Autonomous Tier

Upgrades the LLM to Gemma 4 Effective 4B for more nuanced reasoning. Adds a neural cross-encoder reranker (ms-marco-MiniLM-L-6-v2) that re-scores (query, document) pairs after hybrid retrieval for significantly better recall precision. Full autonomous memory reflection and contradiction resolution. Requires Ollama.

Capability Matrix

Every capability mapped to its minimum tier. Each tier includes all capabilities from the tiers below it.

Capability keyword semantic smart autonomous
Search & Recall
FTS5 keyword search (memory_search)YesYesYesYes
Semantic embedding (cosine similarity)YesYesYes
Hybrid recall (FTS5 + cosine, adaptive blend)YesYesYes
HNSW approximate nearest-neighbor indexYesYesYes
LLM query expansion (memory_expand_query)YesYes
Neural cross-encoder reranking (ms-marco-MiniLM)Yes
Memory Management
Store, update, delete, promoteYesYesYesYes
Link memories (4 relation types)YesYesYesYes
Bulk forget by pattern/namespace/tierYesYesYesYes
Manual consolidation (user-provided summary)YesYesYesYes
Auto-consolidation (LLM-generated summary)YesYes
Auto-tagging (memory_auto_tag)YesYes
Contradiction detection (memory_detect_contradiction)YesYes
Autonomous memory reflectionYes
Embedding Model
Modelall-MiniLM-L6-v2nomic-embed-text-v1.5nomic-embed-text-v1.5
Dimensions384768768
RuntimeCandle (local)OllamaOllama
Model size~90 MB~274 MB~274 MB
LLM (Language Model)
ModelGemma 4 Effective 2BGemma 4 Effective 4B
Ollama taggemma4:e2bgemma4:e4b
Model size~7.2 GB~9.6 GB
Resources
Total RAM0 MB~256 MB~1 GB~4 GB
External dependenciesNoneNoneOllamaOllama
MCP tools exposed13141717
Ollama models to pullnomic-embed-text + gemma4:e2bnomic-embed-text + gemma4:e4b

Tiers gate features, not models. The --tier flag controls which tools are exposed. The LLM model is independently configurable via llm_model in config.toml. For example, run autonomous tier (all features) with the faster e2b model: llm_model = "gemma4:e2b" (46 tok/s vs 26 tok/s for e4b). If Ollama is unavailable at startup, smart and autonomous tiers fall back to semantic automatically.

Configuration File

# ~/.config/ai-memory/config.toml
# Created automatically on first run with defaults commented out

tier = "autonomous"                   # keyword | semantic | smart | autonomous
db = "~/.claude/ai-memory.db"         # SQLite database path
ollama_url = "http://localhost:11434" # Ollama API endpoint
llm_model = "gemma4:e2b"             # independently configurable (e2b=46tok/s, e4b=26tok/s)
cross_encoder = true                 # Neural reranking (autonomous tier)
default_namespace = "global"         # Default namespace for new memoriestoml

17 MCP Tools Universal Integration

ai-memory runs as a Model Context Protocol (MCP) tool server over stdio. Any MCP-compatible AI client -- Claude, ChatGPT, Grok, Llama, or custom agents -- discovers these tools automatically.

Claude MCP native ChatGPT MCP / HTTP Grok MCP / HTTP Llama MCP / HTTP stdio / HTTP MCP Server JSON-RPC / up to 17 tools --tier keyword|semantic|smart|autonomous rusqlite SQLite + FTS5 WAL mode | HNSW index smart+ tiers Ollama (local LLM) nomic-embed | Gemma 4 E2B/E4B Smart: query expansion, auto-tag Autonomous: + cross-encoder reranker contradiction detection, neural reranking ai-memory --db path/to/memory.db mcp --tier smart
memory_store

Store a new memory. Deduplicates by title+namespace. Detects contradictions with existing memories.

memory_recall

Fuzzy OR search with 6-factor ranking. Auto-touches recalled memories (extends TTL, may promote).

memory_search

Exact keyword AND search. Returns memories matching all terms.

memory_list

Browse memories with filters: namespace, tier, tags, date range.

memory_get

Retrieve a single memory by ID, including all its links.

memory_update

Update an existing memory: change title, content, tier, priority, or tags.

memory_delete

Delete a specific memory by ID. Links cascade automatically.

memory_promote

Promote a memory to long-term permanent storage. Clears expiry.

memory_forget

Bulk delete by pattern, namespace, or tier.

memory_link

Link two memories: related_to, supersedes, contradicts, or derived_from.

memory_get_links

Get all links for a memory by ID.

memory_consolidate

Merge multiple memories into one long-term summary.

memory_stats

Database statistics: counts by tier, namespaces, link count, DB size.

memory_capabilities

Returns available capabilities for the current feature tier. Lets the AI discover what tools and features are active.

memory_expand_query

LLM-powered query expansion. Broadens a recall query with synonyms and related terms for better recall coverage. (smart+ tiers)

memory_auto_tag

LLM-powered auto-tagging. Analyzes memory content and suggests relevant tags automatically. (smart+ tiers)

memory_detect_contradiction

LLM-powered contradiction analysis. Compares a memory against existing memories to detect conflicts and inconsistencies. (smart+ tiers)

20 HTTP API Endpoints Universal Fallback

Start with ai-memory serve (default: http://127.0.0.1:9077). The HTTP API works with any AI platform, any programming language, any framework. If it can make an HTTP request, it can use ai-memory.

MethodEndpointDescription
GET/healthDeep health check (DB + FTS5 integrity)
GET/memoriesList memories (filter: namespace, tier, priority, date range, tags)
POST/memoriesCreate memory (dedup on title+namespace, contradiction detection)
POST/memories/bulkBulk create (up to 1000 items per request)
GET/memories/{id}Get memory by ID (includes links)
PUT/memories/{id}Update memory (partial update, validated)
DELETE/memories/{id}Delete memory (links cascade)
POST/memories/{id}/promotePromote memory to long-term (clears expiry)
GET/searchFTS5 AND search with 6-factor ranking
GET/recallFuzzy OR recall + touch + auto-promote
POST/recallRecall via POST body (for longer queries)
POST/forgetBulk delete by pattern/namespace/tier
POST/consolidateMerge 2-100 memories into one long-term summary
POST/linksCreate memory link (4 relation types)
GET/links/{id}Get all links for a memory
GET/namespacesList namespaces with counts
GET/statsAggregate statistics
POST/gcRun garbage collection on expired memories
GET/exportExport all memories + links as JSON
POST/importImport memories + links from JSON

Integration Examples

# Python (works with any AI backend: OpenAI, Anthropic, local Llama, etc.)
import requests

def ai_store_memory(title, content, tier="mid"):
    requests.post("http://127.0.0.1:9077/api/v1/memories", json={
        "title": title, "content": content, "tier": tier
    })

def ai_recall(context):
    r = requests.get("http://127.0.0.1:9077/api/v1/recall", params={"context": context})
    return r.json()

# Use in your AI's tool/function definitions
# Works with OpenAI function calling, Anthropic tool use, etc.python

25 CLI Commands Universal

Global flags: --db <path> and --json. Scriptable, pipeable, works in any shell. Use directly or wrap in your AI's tool layer.

CategoryCommandDescription
ServermcpRun as MCP tool server over stdio (primary integration for MCP clients)
ServerserveStart HTTP daemon (--host, --port, default 9077) -- universal API for any AI
CorestoreStore memory (-T title, -c content, --tier, --namespace, --tags, --priority, --confidence, --source)
CoreupdateUpdate memory by ID (partial fields)
CoredeleteDelete memory by ID (links cascade)
CorepromotePromote to long-term (clears expiry)
QueryrecallFuzzy OR recall with 6-factor ranking (--namespace, --limit, --tags, --since)
QuerysearchAND keyword search (--namespace, --tier, --limit, --since, --until, --tags)
QuerygetGet memory by ID (includes links)
QuerylistList with filters (--namespace, --tier, --limit, --since, --until, --tags)
ManageforgetBulk delete (--namespace, --pattern, --tier)
ManagelinkLink two memories (--relation: related_to, supersedes, contradicts, derived_from)
ManageconsolidateMerge N memories into one (-T title, -s summary, --namespace)
ManageresolveResolve contradiction: winner supersedes loser (demotes loser: priority=1, confidence=0.1)
Manageauto-consolidateAuto-group by namespace+tag and consolidate (--dry-run, --short-only, --min-count, --namespace)
OpsgcRun garbage collection on expired memories
OpsstatsShow statistics (counts, tiers, namespaces, links, DB size)
OpsnamespacesList all namespaces with memory counts
OpssyncSync databases (--direction pull|push|merge, dedup-safe upsert)
I/OexportExport all memories + links as JSON (stdout)
I/OimportImport memories + links from JSON (stdin)
I/OcompletionsGenerate shell completions (bash, zsh, fish)
I/OmanGenerate roff man page to stdout
I/OmineImport memories from historical conversations (Claude, ChatGPT, Slack)
OpsshellInteractive REPL with color output (recall, search, list, get, stats, namespaces, delete)

Three-Tier Memory

Memories are organized into three tiers that mirror human memory systems. Each tier has automatic TTL management, and memories flow upward through access patterns.

Short-Term

6h

Ephemeral context. Current task state, debugging notes, transient observations.

Extends +1h on each recall. Good for "what am I working on right now" context.

Mid-Term

7d

Working knowledge. Sprint goals, recent decisions, active project context.

Extends +1d on recall. Auto-promotes to long-term at 5 accesses.

Long-Term

Permanent. Architecture, user preferences, hard-won lessons, corrections.

Never expires. Highest tier boost (3.0) in recall ranking. The knowledge bedrock.

Store create Recall touch + rank TTL Extend +1h / +1d Auto-Promote at 5 accesses Consolidate merge N to 1 dedup on title+ns +1 priority every 10 accesses short: +1h, mid: +1d mid to long, clears expiry auto-consolidate groups Contradiction detect

6-Factor Recall Scoring

Every recall query computes a composite score entirely in SQLite. Higher scores rank first. No external ML or embedding service required.

score = fts_rank * -1 + priority * 0.5 + MIN(access_count, 50) * 0.1 + confidence * 2.0 + tier_boost + 1/(1 + days * 0.1)
FTS Relevance -- SQLite FTS5 rank (negated: lower = better)
Priority -- 1-10 weighted by 0.5 (range: 0.5 - 5.0)
Access Count -- weighted by 0.1 (unbounded, rewards frequent use)
Confidence -- 0.0-1.0 weighted by 2.0 (range: 0.0 - 2.0)
Tier Boost -- long=3.0, mid=1.0, short=0.0
Recency -- 1/(1 + days_since_update * 0.1), today=1.0, 10d=0.5

Recency Decay Curve

1.0 0.5 0.0 0 10d 20d 30d 40d 50d Days since last update Decay factor today: 1.00 10d: 0.50 20d: 0.33

Security

Defense in depth, even for a local tool. Every input is validated, every error is sanitized, every write is transactional.

Transaction Safety

Every write operation is wrapped in a SQLite transaction. WAL mode enables concurrent reads without blocking. Schema migrations are atomic.

FTS5 Injection Prevention

Search queries are sanitized before reaching FTS5. All special characters including | (pipe/OR operator), ", *, ^, :, -, braces, and parentheses are stripped. Boolean operators (AND, OR, NOT, NEAR) are filtered as standalone tokens. Every term is double-quoted.

Body Size Limits

HTTP request bodies are capped at 50MB via DefaultBodyLimit. Prevents memory exhaustion from oversized payloads at the transport layer.

CORS (Permissive for Localhost)

The HTTP server applies CorsLayer::permissive() -- open CORS policy appropriate for localhost-bound services. Safe because the server defaults to 127.0.0.1 binding.

Sanitized Error Responses

Error messages never leak database internals, file paths, or stack traces. Handlers return generic "internal server error" strings; details go to tracing::error! only.

Bulk Limits (1000)

Bulk create and import operations cap at 1000 items per request (MAX_BULK_SIZE). Prevents memory exhaustion and denial-of-service from oversized batches.

AtomicBool Thread Safety

Color output uses AtomicBool with atomic ordering for thread-safe global state. No mutexes needed for the color-enabled flag across threads.

Link Validation in Sync

During database sync (pull, push, merge), every imported link is validated via validate::validate_link() before insertion. Invalid links are silently skipped to prevent corrupt cross-references.

JSON-RPC Version Validation

The MCP server validates that every incoming request has jsonrpc: "2.0". Non-conformant requests are rejected before any tool dispatch occurs.

Arguments Validation

MCP tool calls extract arguments from the request params object. Non-object arguments default to an empty object, preventing type-confusion attacks on tool handlers.

Input Validation

Shared validation layer across CLI, HTTP, and MCP. Title max 512B, content max 64KB, namespace alphanumeric, source whitelisted, priority 1-10, confidence 0.0-1.0.

Localhost-Only Binding

The HTTP server binds to 127.0.0.1 by default. Your memories never leave your machine unless you explicitly configure otherwise.

Architecture

Single Rust binary. Three universal interfaces. Four feature tiers with optional local LLMs via Ollama.

Claude ChatGPT Grok Llama Any MCP Client CLI 25 commands MCP Server 17 tools / stdio HTTP API 20 endpoints / Axum Universal Universal Universal Validation Layer (validate.rs) + Structured Errors (errors.rs) FEATURE TIERS Keyword FTS5 only 0 MB | 13 tools Semantic MiniLM-L6 384d 256 MB | 14 tools candle (local) Smart nomic 768d + Gemma4 E2B 1 GB | 17 tools Autonomous nomic 768d + Gemma4 E4B + reranker 4 GB | 17 tools + cross-encoder Ollama localhost:11434 gemma4:e2b / e4b nomic-embed-text SQLite + FTS5 + HNSW (db.rs) WAL mode | schema v3 | embeddings | 161 tests short 6h mid 7d long forever instant-distance HNSW | cosine similarity | 6-factor ranking

Feature Matrix

All three interfaces are universal -- any AI platform can use any of them. They share the same validation layer and database.

CapabilityCLI (Universal)HTTP API (Universal)MCP (Universal)
Store memoryYesYesYes
Update memoryYesYesYes
Recall (fuzzy OR)YesYesYes
Search (AND)YesYesYes
Get by IDYesYesYes
List with filtersYesYesYes
DeleteYesYesYes
PromoteYesYesYes
Forget (bulk delete)YesYesYes
Link memoriesYesYesYes
Get linksYesYesYes
ConsolidateYesYesYes
StatsYesYesYes
Bulk create--Yes--
Resolve contradictionsYes----
Auto-consolidateYes----
Sync databasesYes----
Interactive shellYes----
Export / ImportYesYes--
Garbage collectionYesYes--
Namespaces listYesYes--
Shell completionsYes----
Man pageYes----

Interactive Shell

ai-memory shell opens a REPL with color-coded output. Tiers are red/yellow/green, priority is visualized as bars, namespaces appear in cyan.

ai-memory shell ai-memory shell -- type 'help' for commands, 'quit' to exit memory> recall database setup [long] Project uses PostgreSQL 15 score: 8.42 Main database is PostgreSQL 15 with pgvector for embeddings... [mid] Database migration to v3 score: 5.71 Sprint goal: migrate schema from v2 to v3 by end of week... [short] Debug: connection pool exhausted score: 2.38 Seeing connection pool exhaustion under load in staging... 3 memory(ies) recalled memory> stats total: 47, links: 12, db: 284 KB long: 18 mid: 21 short: 8

Usage Examples

All interfaces work with any AI platform. Choose the one that fits your setup.

CLI Usage

# Store a memory
ai-memory store -T "Project uses Rust 2021 edition" \
  -c "Rust 2021, Axum for HTTP, SQLite for storage." \
  --tier long --priority 7

# Recall relevant memories
ai-memory recall "what language and framework"

# Exact keyword search
ai-memory search "Axum"

# List all, JSON output
ai-memory list --jsonbash

HTTP API Usage

# Start the daemon
ai-memory serve &

# Store via API (works from any language, any AI backend)
curl -X POST http://127.0.0.1:9077/api/v1/memories \
  -H 'Content-Type: application/json' \
  -d '{"title":"Test","content":"It works.","tier":"short"}'

# Recall
curl "http://127.0.0.1:9077/api/v1/recall?context=test"bash

CI/CD Pipeline

GitHub Actions runs on every push and PR. Releases are automated on tag push with cross-platform binaries.

Push fmt check clippy -D warnings test 161 tests build release release linux + macOS ubuntu-latest + macos-latest | x86_64-linux + aarch64-darwin on tag: v*

LongMemEval Benchmark

ICLR 2025 dataset, 500 questions, 6 categories

Results

ConfigR@1R@5R@10R@20TimeSpeed
Parallel FTS5 (keyword)86.2%97.0%98.2%99.4%2.2s232 q/s
LLM-expanded + parallel FTS586.8%97.8%99.0%99.8%3.5s142 q/s

Per-Category Breakdown (LLM-expanded)

CategoryR@1R@5R@10R@20
single-session-assistant100.0%100.0%100.0%100.0%
knowledge-update91.0%100.0%100.0%100.0%
single-session-user88.6%98.6%100.0%100.0%
multi-session88.0%97.7%98.5%100.0%
temporal-reasoning79.7%96.2%98.5%99.2%
single-session-preference73.3%93.3%96.7%100.0%
OVERALL86.8%97.8%99.0%99.8%
499/500 recalled at R@20
$0 Zero cloud API costs
3.5s recall on 10 cores
FTS5 Pure SQLite FTS5 + BM25

Reproduce

# 1. Clone dataset
git clone --depth 1 https://github.com/xiaowu0162/LongMemEval /tmp/LongMemEval
cd /tmp/LongMemEval/data
curl -sLO https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
cd -

# 2. Install
cargo install --git https://github.com/alphaonedev/ai-memory-mcp.git
pip install tabulate requests

# 3. Run (keyword -- 2.2s)
python3 benchmarks/longmemeval/harness_99.py --dataset-path /tmp/LongMemEval --variant S --no-expand --workers 10

# 4. Run (LLM-expanded -- requires Ollama with gemma3:4b)
python3 benchmarks/longmemeval/harness_99.py --dataset-path /tmp/LongMemEval --variant S --workers 10bash