Auto-tagging. Auto-consolidation. Query expansion. Contradiction detection. Memory reflection. Six features that turn ai-memory from a store into an agent — local-first by default (Gemma via Ollama) and, as of v0.7.0 (#1067 + #1146), provider-agnostic by config: route any of these through xAI Grok, OpenAI, Anthropic, Gemini, DeepSeek, Kimi, Qwen, Mistral, Groq, Together, Cerebras, OpenRouter, Fireworks, LMStudio, vLLM, or any other OpenAI-compatible endpoint via a [llm] section in ~/.config/ai-memory/config.toml. Local stays the default for the privacy story below; cloud unlocks the CPU-only and cellphone postures. Canonical schema (single source of truth, every surface): CONFIG_SCHEMA.md. Per-vendor recipes (config.toml + MCP env-block override): integrations/llm-backends.md.
Every autonomous feature on this page is made possible by Google's Gemma 4 family, released under an open weights license. Gemma 4 Effective 2B (~1 GB Q4) and Gemma 4 Effective 4B (~2.3 GB Q4) are the two models ai-memory targets — small enough to run locally on a laptop, capable enough to drive real agent reasoning. Thank you to the Gemma team and to Google for choosing to ship these models open. ai-memory is materially better because of it, and the entire local-first agent ecosystem stands on this contribution.
See the credits page for the full open-source acknowledgement and license enumeration.
Autonomous features unlock as the operator allocates more memory to the daemon. Keyword tier needs zero extra RAM — just FTS5. Semantic tier loads embeddings (~256 MB). Smart tier adds Gemma 4 E2B (~1 GB). Autonomous tier upgrades to Gemma 4 E4B + cross-encoder reranker (~4 GB total). The RAM figures below are the local-model path (Ollama serving Gemma + a local embedder); when you point [llm] + [embeddings] at a remote API (see No GPU required below), the daemon host carries no model weight at all — only the ~90 MB CPU cross-encoder if reranking is enabled.
derived_from KG relation, so provenance survives. The biological-memory analog of sleep-driven episodic-to-long-term consolidation.contradicts relation: when the LLM flags a contradiction, the system can auto-link the pair so future recall surfaces the conflict.toon_compact.From this point on, the autonomous MCP tools are live. Wire them into your AI client (Claude Code, Cursor, Codex, Continue, etc.) — see integrations atlas.
[llm] at any LLM you have API access to.Nothing on this page is hard-wired to a GPU, to Ollama, or to Gemma. The Ollama + Gemma quick-start above is the local-first default, not a requirement. As of v0.7.0 (#1067 + #1146) any LLM reachable over an HTTP endpoint can drive every autonomous feature — wherever you have API access, wherever you run on hosts with no GPUs, and you want to run ai-memory in autonomous mode with --profile full. The backend is a config choice ([llm] in config.toml or the AI_MEMORY_LLM_* env vars), resolved through the canonical precedence ladder CLI > env > [llm] > legacy > default.
Two CPU-only / air-gap postures below. The model names are examples, not requirements — substitute whatever model your provider or internal endpoint serves.
Secrets discipline: the key is referenced by env-var name (api_key_env) or by a mode-0400 file (api_key_file) — an inline api_key = "…" literal is rejected at parse time. Canonical schema + per-vendor recipes: CONFIG_SCHEMA.md · integrations/llm-backends.md.