CORE TODAY · v0.7.0 PG+pgvector + Apache AGE · v0.7.0 SHIPPED

Tier 4 — Data-center swarm

Hundreds of nodes. Thousands of agents. Multiple racks. One coherent memory fabric. The core fabric shipped at v0.6.3; the shared distributed store (Postgres + pgvector + Apache AGE) shipped GA at v0.7.0.

100s nodes · 1000s agents multi-rack 3-of-5 quorum mTLS allowlist

The core fabric shipped at v0.6.3: the peer mesh, W-of-N quorum-writes, mTLS with fingerprint allowlist, federated governance, pending decisions, namespace metadata, capabilities introspection v2. The shared distributed store (Postgres + pgvector) SHIPPED at v0.7.0 via --features sal-postgres.

The diagram distinguishes layers: solid edges = always-on at v0.7.0; the Postgres + Apache AGE + pgvector layer ships at v0.7.0 via `--features sal-postgres`.

Architecture diagram

Multi-rack swarm · pgvector + Apache AGE backbone (shipped at v0.7.0).

T4 · Multi-rack swarm · pgvector + Apache AGE backbone (shipped at v0.7.0) · quorum writes (always-on)
Federated control plane namespace_meta · governance policy pending decisions · capabilities v2 Rack A · zone-a 10.1.0.0/24 ai-memory-0-08 agents · scope-isolated ai-memory-0-18 agents · scope-isolated ai-memory-0-28 agents · scope-isolated ai-memory-0-38 agents · scope-isolated Rack B · zone-a 10.1.1.0/24 ai-memory-1-08 agents · scope-isolated ai-memory-1-18 agents · scope-isolated ai-memory-1-28 agents · scope-isolated ai-memory-1-38 agents · scope-isolated Rack C · zone-b 10.1.2.0/24 ai-memory-2-08 agents · scope-isolated ai-memory-2-18 agents · scope-isolated ai-memory-2-28 agents · scope-isolated ai-memory-2-38 agents · scope-isolated Postgres + pgvector --features sal-postgres shared store · shared HNSW SHIPPED · v0.7.0 sal-postgres + Apache AGE Quorum-write contract — SHIPS TODAY --quorum-writes N --quorum-peers a,b,c 503 quorum_not_met on timeout · ADR-0001 realised mTLS peer mesh + fingerprint allowlist --tls-cert · --tls-key · --mtls-allowlist (SHA-256) peer-mesh identity gate — refuses unknown certs
sync_push mesh (today) federated control plane pgvector / quorum / Apache AGE (shipped v0.7.0) mTLS peer transport
Solid lines ship today. Dashed indigo lines are v0.7 Track B (Postgres + pgvector backbone). The federated control plane already propagates governance policy and pending decisions across all racks via sync_push.
Quorum-write contract

Shipping today.

The big T4-relevant capability that's already wired:

ADR-0001 is realised, not aspirational. Phase 1 (scaffold) and Phase 2 (wired into write path) shipped. Phase 3 (chaos harness) and Phase 4 (formal convergence-bound report) are still work — the convergence guarantees in production are exercised by the a2a-gate scenario suite (CHANGELOG #325/#326/#327).

Track B — SHIPPED v0.7.0

Postgres + pgvector backbone.

The single-SQLite-per-node ceiling bites at T4 scale (>10⁶ memories per node, hot writers). The structural fix is a shared store.

Today (v0.6.x): sal-postgres Cargo feature flag exists; sqlx + pgvector deps wired. The Postgres adapter compiles and runs; correctness fixes shipped in v0.6.0 pre-tag (#294 SAL upsert key alignment, #295 metadata.agent_id immutability via jsonb_set, #296 tier-downgrade protection via SQL tier_rank(), #297 schema parity with 6 tables + generated scope_idx column). That work is the foundation.

v0.7.0 SHIPPED: Performance maturation complete, ai-memory migrate --from sqlite:// --to postgres:// migration tool wired, pgvector + Apache AGE recipes in docs/migration-v0.7.0-postgres.md, official deployment via --features sal-postgres.

What the shared store unlocks at GA:

What ships today at scale

Every primitive used in the T3 diagram extends to T4.

The topology just gets denser.

Limits

Honest ceilings.

Dimensionv0.6.4v0.7.0 SHIPPED
Total memories per clusterbounded by per-node HNSW RAMpgvector → 10⁸+
Write durability across racksW-of-N quorum (--quorum-writes N)unchanged
Partition tolerancequorum-bounded divergence; 503 quorum_not_met on shortfallunchanged
Operational primitivesper-node SQLite, replicated; backups per nodePostgres ecosystem (PITR, replicas, central monitoring)
Vector index drifteach node embeds & indexes independentlyshared pgvector
mTLSenforced via fingerprint allowlistunchanged

A 10-rack, 100-node fleet with 3-of-5 quorum is in scope today. The v0.7 piece that's still maturing is the Postgres backbone — the vector-index drift across nodes and the per-node SQLite operational footprint are the things shared-store fixes.

Deployment recipe

Today: quorum + mTLS allowlist.

# rack-a/node-2 — quorum 3-of-5 across the rack, mTLS enforced
ai-memory --db /var/lib/ai-memory/store.db serve \
  --host 0.0.0.0 --port 9077 \
  --quorum-writes 3 \
  --quorum-peers https://rack-a-1:9077,https://rack-a-3:9077,https://rack-b-1:9077,https://rack-b-2:9077,https://rack-c-1:9077 \
  --quorum-timeout-ms 1500 \
  --tls-cert /etc/ai-memory/tls.crt \
  --tls-key /etc/ai-memory/tls.key \
  --mtls-allowlist /etc/ai-memory/peer-fingerprints.txt \
  --quorum-client-cert /etc/ai-memory/client.crt \
  --quorum-client-key /etc/ai-memory/client.key \
  --quorum-ca-cert /etc/ai-memory/ca.crt \
  --catchup-interval-secs 15

v0.7.0 — same node, swap to Postgres-backed store:

ai-memory serve \
  --store-url postgres://ai-memory@pg-cluster.svc.cluster.local:5432/store \
  --host 0.0.0.0 --port 9077 \
  --quorum-writes 3 \
  --quorum-peers https://rack-a-1:9077,https://rack-a-3:9077,https://rack-b-1:9077

The peer-fingerprints.txt file lists trusted SHA-256 fingerprints (one per line, optional : separators, # comments). The --mtls-allowlist flag's presence is what makes mTLS enforcement required — if a peer's cert isn't on the list, the connection is refused at TLS handshake.

Wiring

Governance, skills, and attestations at T4.

Source

Source-of-truth references.