ai-memory · Autonomous — agent intelligence powered by Gemma 4

▸ Thanks

Powered by Gemma 4 — open-sourced by Google.

Every autonomous feature on this page is made possible by Google's Gemma 4 family, released under an open weights license. Gemma 4 Effective 2B (~1 GB Q4) and Gemma 4 Effective 4B (~2.3 GB Q4) are the two models ai-memory targets — small enough to run locally on a laptop, capable enough to drive real agent reasoning. Thank you to the Gemma team and to Google for choosing to ship these models open. ai-memory is materially better because of it, and the entire local-first agent ecosystem stands on this contribution.

Gemma 4 E2B

~1 GB Q4 · Smart tier · Google · ai.google.dev/gemma

Gemma 4 E4B

~2.3 GB Q4 · Autonomous tier · Google · ai.google.dev/gemma

Ollama

Local LLM serving · MIT · ollama.com

nomic-embed-text

Embeddings · Apache 2.0 · Nomic AI · nomic.ai

MiniLM-L6-v2

Embeddings · Apache 2.0 · Hugging Face · model card

cross-encoder/ms-marco

Reranker · Apache 2.0 · Hugging Face · model card

See the credits page for the full open-source acknowledgement and license enumeration.

The four feature tiers

Pick the model size your hardware allows.

Autonomous features unlock as the operator allocates more memory to the daemon. Keyword tier needs zero extra RAM — just FTS5. Semantic tier loads embeddings (~256 MB). Smart tier adds Gemma 4 E2B (~1 GB). Autonomous tier upgrades to Gemma 4 E4B + cross-encoder reranker (~4 GB total).

Keyword

RAM: 0 MB extra · always available

FTS5 keyword search only. The lowest-overhead option — runs on a Raspberry Pi.

▸ keyword_search × semantic_search × auto_tag × consolidate × contradictions

Semantic

RAM: ~256 MB · MiniLM-L6-v2

Adds 384-dim embeddings + HNSW vector index. Hybrid 70/30 recall (FTS5 + semantic).

▸ keyword_search ▸ semantic_search ▸ hybrid_recall × auto_tag × consolidate

Smart

RAM: ~1 GB · nomic-embed + Gemma 4 E2B

Adds 768-dim embeddings + Google's Gemma 4 E2B for reasoning. Unlocks the LLM-driven features below.

▸ all of Semantic ▸ auto_tag ▸ consolidate ▸ expand_query ▸ contradiction

Autonomous

RAM: ~4 GB · nomic-embed + Gemma 4 E4B + cross-encoder

Top tier — Google's Gemma 4 E4B for stronger reasoning, plus cross-encoder reranking for top-k recall precision. The full agent intelligence stack.

▸ all of Smart ▸ cross_encoder_reranking ▸ memory_reflection (planned, v0.7+) ▸ session_start (LLM-driven)

The six autonomous features

What Gemma 4 unlocks.

memory_auto_tagSmart tier+

LLM looks at a memory's title + content and proposes tags. New tags merge into the existing tag set (no overwrites). Operators use this to tag bulk-imported memories without writing rules.

// MCP — memory_auto_tag {"id": "550e8400-e29b-41d4-a716-446655440000"} → {"id": "…", "new_tags": ["okr", "q3", "engineering"], "all_tags": ["draft", "okr", "q3", "engineering"]} // Gemma 4 reads title+content, returns 3-5 relevant tags. The new tags are // merged with whatever was already there.

memory_consolidateSmart tier+

Bulk-collapses N memories (up to 100) into 1 derived summary. Source memories are linked to the consolidated output via derived_from KG relation, so provenance survives. The biological-memory analog of sleep-driven episodic-to-long-term consolidation.

// MCP — memory_consolidate { "ids": ["id-1", "id-2", "id-3", …], // 2-100 ids "title": "Q3 OKR — consolidated retrospective", "namespace": "alphaone/eng" } → {"consolidated_id": "…", "summary": "", "source_count": 12, "links_created": 12} // each source → derived_from edge

memory_expand_querySmart tier+

Takes a short user query and expands it into a richer set of related terms. Used to widen recall when the literal query doesn't match enough rows. Especially useful for vague natural-language queries against a corpus that uses precise jargon.

// MCP — memory_expand_query {"query": "how do we deploy"} → {"original": "how do we deploy", "expanded_terms": ["deploy", "deployment", "release", "ship", "rollout", "kubernetes", "ci pipeline", "container registry"]} // Caller can then run memory_search across the expanded set.

memory_detect_contradictionSmart tier+

Compares two memories and tells you if they contradict. Powers the v0.6.3 KG contradicts relation: when the LLM flags a contradiction, the system can auto-link the pair so future recall surfaces the conflict.

// MCP — memory_detect_contradiction { "id_a": "id of the older memory", "id_b": "id of the newer memory" } → {"contradicts": true, "memory_a": {"id": "…", "title": "We use Postgres"}, "memory_b": {"id": "…", "title": "We migrated to MySQL"}}

cross_encoder_rerankingAutonomous tier

Cross-encoder reranker scores top-K recall results against the query, reordering for precision. Where keyword + vector recall return a candidate set, the cross-encoder is the final pass that puts the best match first. Adds ~50ms to a recall but materially improves top-1 quality.

// Implicit — automatically applied during memory_recall when: // 1. Autonomous tier is configured, AND // 2. cross-encoder model loaded successfully at startup // // Recall pipeline becomes: // FTS5 70% ⊕ HNSW 30% → candidate set (top-100 typical) → // Cross-encoder rerank → final top-K (default 10)

memory_session_startAutonomous tier (LLM-driven)

Run at the start of an agent session. Recalls the most relevant memories given the session's stated context, optionally LLM-summarized into a session brief. The agent's "morning briefing" — pulls in the right context without explicit recall calls peppered through the prompt.

// MCP — memory_session_start { "context": "continuing the q3 OKR review thread from yesterday", "namespace": "alphaone/eng/leadership", "as_agent": "alphaone/eng/leadership/alice", "summarize": true // LLM-generate a brief from the recall hits } → {"recalled": [12 top memories, ranked, with budget_tokens cap respected], "session_brief": "Yesterday's discussion focused on..." // Gemma 4 summary }

Why local matters

Every byte stays on the host.

No content leaves your machine. Auto-tag, consolidate, expand-query, contradiction-detect — all four prompt Gemma 4 with your memory contents. Because Ollama serves Gemma locally, the contents never leave the host. No SaaS provider sees your data; no API key risks leaking it.

No telemetry. ai-memory itself ships zero telemetry. Ollama itself ships zero telemetry. Your agent's memory operations stay between you and the daemon.

Air-gap compatible. Once Gemma 4 is pulled (one-time download), the daemon runs offline. Useful for regulated environments, classified work, or anywhere outbound network egress is restricted.

Cost-stable. No per-token billing. The model is yours. Auto-tagging 100 000 memories costs the same in API fees as auto-tagging 0 (zero).

Made possible by Google's open-weight Gemma 4. Without Google's choice to ship Gemma open, every feature on this page would require a paid hosted API. The local-first agent ecosystem is much smaller without Gemma 4 in it.

Operator quick start

From zero to autonomous in 4 commands.

# 1. Install Ollama $ brew install ollama $ ollama serve & # 2. Pull Gemma 4 (one-time, ~2.3 GB for E4B) $ ollama pull gemma4:e4b # or for the smaller Smart-tier model: $ ollama pull gemma4:e2b # 3. Configure ai-memory for autonomous tier $ cat > ~/.config/ai-memory/config.toml <<EOF tier = "autonomous" ollama_url = "http://localhost:11434" cross_encoder = true EOF # 4. Start the daemon $ ai-memory serve # Verify the autonomous features came online: $ ai-memory capabilities → "tier": "autonomous" → "features": { "auto_tag": true, "auto_consolidation": true, "cross_encoder_reranking": true, "embedder_loaded": true }

From this point on, the autonomous MCP tools are live. Wire them into your AI client (Claude Code, Cursor, Codex, Continue, etc.) — see integrations atlas.

Autonomous-tier intelligence.

Powered by Gemma 4 — open-sourced by Google.

Pick the model size your hardware allows.

What Gemma 4 unlocks.

Every byte stays on the host.

From zero to autonomous in 4 commands.