CORE TODAY · v0.6.3 PG+pgvector GA · v0.7

Tier 4 — Data-center swarm

Hundreds of nodes. Thousands of agents. Multiple racks. One coherent memory fabric. The core fabric ships in v0.6.3; the piece still maturing toward GA at this scale is the shared distributed store (Postgres + pgvector).

100s nodes · 1000s agents multi-rack 3-of-5 quorum mTLS allowlist

The core fabric ships in v0.6.3: the peer mesh, W-of-N quorum-writes, mTLS with fingerprint allowlist, federated governance, pending decisions, namespace metadata, capabilities introspection v2. The piece still maturing toward GA at this scale is the shared distributed store (Postgres + pgvector, behind the sal-postgres Cargo feature today, GA targeted v0.7).

The diagram distinguishes shipping vs roadmap: solid edges = today, indigo dashed = v0.7 GA piece.

Architecture diagram

Multi-rack swarm · pgvector backbone (roadmap).

T4 · Multi-rack swarm · pgvector backbone (roadmap) · quorum writes (today)
Federated control plane namespace_meta · governance policy pending decisions · capabilities v2 Rack A · zone-a 10.1.0.0/24 ai-memory-0-08 agents · scope-isolated ai-memory-0-18 agents · scope-isolated ai-memory-0-28 agents · scope-isolated ai-memory-0-38 agents · scope-isolated Rack B · zone-a 10.1.1.0/24 ai-memory-1-08 agents · scope-isolated ai-memory-1-18 agents · scope-isolated ai-memory-1-28 agents · scope-isolated ai-memory-1-38 agents · scope-isolated Rack C · zone-b 10.1.2.0/24 ai-memory-2-08 agents · scope-isolated ai-memory-2-18 agents · scope-isolated ai-memory-2-28 agents · scope-isolated ai-memory-2-38 agents · scope-isolated Postgres + pgvector --features sal-postgres shared store · shared HNSW ROADMAP · v0.7 Track B Quorum-write contract — SHIPS TODAY --quorum-writes N --quorum-peers a,b,c 503 quorum_not_met on timeout · ADR-0001 realised mTLS peer mesh + fingerprint allowlist --tls-cert · --tls-key · --mtls-allowlist (SHA-256) peer-mesh identity gate — refuses unknown certs
sync_push mesh (today) federated control plane pgvector / quorum (roadmap) mTLS peer transport
Solid lines ship today. Dashed indigo lines are v0.7 Track B (Postgres + pgvector backbone). The federated control plane already propagates governance policy and pending decisions across all racks via sync_push.
Quorum-write contract

Shipping today.

The big T4-relevant capability that's already wired:

ADR-0001 is realised, not aspirational. Phase 1 (scaffold) and Phase 2 (wired into write path) shipped. Phase 3 (chaos harness) and Phase 4 (formal convergence-bound report) are still work — the convergence guarantees in production are exercised by the a2a-gate scenario suite (CHANGELOG #325/#326/#327).

Track B roadmap

Postgres + pgvector backbone.

The single-SQLite-per-node ceiling bites at T4 scale (>10⁶ memories per node, hot writers). The structural fix is a shared store.

Today (v0.6.x): sal-postgres Cargo feature flag exists; sqlx + pgvector deps wired. The Postgres adapter compiles and runs; correctness fixes shipped in v0.6.0 pre-tag (#294 SAL upsert key alignment, #295 metadata.agent_id immutability via jsonb_set, #296 tier-downgrade protection via SQL tier_rank(), #297 schema parity with 6 tables + generated scope_idx column). That work is the foundation.

v0.7 GA: Performance maturation, migration tool for SQLite→Postgres on existing fleets, pgvector index tuning at >10⁷ memories, official deployment recipes.

What the shared store unlocks at GA:

What ships today at scale

Every primitive used in the T3 diagram extends to T4.

The topology just gets denser.

Limits

Honest ceilings.

DimensionToday (v0.6.3)v0.7 GA
Total memories per clusterbounded by per-node HNSW RAMpgvector → 10⁸+
Write durability across racksW-of-N quorum (--quorum-writes N)unchanged
Partition tolerancequorum-bounded divergence; 503 quorum_not_met on shortfallunchanged
Operational primitivesper-node SQLite, replicated; backups per nodePostgres ecosystem (PITR, replicas, central monitoring)
Vector index drifteach node embeds & indexes independentlyshared pgvector
mTLSenforced via fingerprint allowlistunchanged

A 10-rack, 100-node fleet with 3-of-5 quorum is in scope today. The v0.7 piece that's still maturing is the Postgres backbone — the vector-index drift across nodes and the per-node SQLite operational footprint are the things shared-store fixes.

Deployment recipe

Today: quorum + mTLS allowlist.

# rack-a/node-2 — quorum 3-of-5 across the rack, mTLS enforced
ai-memory --db /var/lib/ai-memory/store.db serve \
  --bind 0.0.0.0:9077 \
  --tier semantic \
  --quorum-writes 3 \
  --quorum-peers https://rack-a-1:9077,https://rack-a-3:9077,https://rack-b-1:9077,https://rack-b-2:9077,https://rack-c-1:9077 \
  --quorum-timeout-ms 1500 \
  --tls-cert /etc/ai-memory/tls.crt \
  --tls-key /etc/ai-memory/tls.key \
  --mtls-allowlist /etc/ai-memory/peer-fingerprints.txt \
  --quorum-client-cert /etc/ai-memory/client.crt \
  --quorum-client-key /etc/ai-memory/client.key \
  --quorum-ca-cert /etc/ai-memory/ca.crt \
  --catchup-interval-secs 15

v0.7 GA target — same node, swap to Postgres-backed store:

ai-memory serve \
  --store-url postgres://ai-memory@pg-cluster.svc.cluster.local:5432/store \
  --bind 0.0.0.0:9077 \
  --tier semantic \
  --quorum-writes 3 \
  --quorum-peers https://rack-a-1:9077,https://rack-a-3:9077,https://rack-b-1:9077

The peer-fingerprints.txt file lists trusted SHA-256 fingerprints (one per line, optional : separators, # comments). The --mtls-allowlist flag's presence is what makes mTLS enforcement required — if a peer's cert isn't on the list, the connection is refused at TLS handshake.

Wiring

Governance, skills, and attestations at T4.

Source

Source-of-truth references.