ai-memory · T3 — Multi-node cluster

This is the first distributed tier. Each node runs an ai-memory process backed by its own SQLite. Writes, links, archives, namespace metadata, and governance decisions fan out to every peer over POST /api/v1/sync/push. Vector-clock causality (sync/since) lets a peer that drops off the network catch up cleanly. The recall pipeline stays local on each node — fast hot reads, no cross-node round trips.

The big upgrade in current builds: quorum-write contract is wired. Set --quorum-writes N --quorum-peers a,b,c and every HTTP write fans out to every peer and only returns OK after the local commit + W-1 peer acks land within --quorum-timeout-ms. Falling short returns HTTP 503 quorum_not_met. This is real durability, not best-effort.

What ships today

The mesh is real. Source-cited.

W-of-N quorum-write contract — src/main.rs lines 405-411: --quorum-writes N --quorum-peers <list> is wired into the write path. src/handlers.rs:442-454 calls broadcast_store_quorum then finalise_quorum to count peer acks; 503 quorum_not_met returns when the deadline (--quorum-timeout-ms, default 2000) lapses without W-1 acks. ADR-0001 is realised, not aspirational. Same wiring applies to broadcast_link_quorum, broadcast_consolidate_quorum, broadcast_pending_decision_quorum, broadcast_namespace_meta_quorum.
sync_push fanout for everything — src/federation.rs ships 10 broadcast functions: store, delete, archive, restore, link, consolidate, pending action, pending decision, namespace metadata, namespace metadata clear. Each is invoked from the corresponding handler in handlers.rs after the local commit lands.
Vector clocks + catch-up — Every memory has a per-peer clock in SyncState. A peer that drops off polls GET /api/v1/sync/since?peer=node-a&clock=7400 and gets only the deltas after that point. The serve daemon also runs a --catchup-interval-secs loop (default 30s) that pulls peers proactively for any updates missed during partition.
mTLS peer mesh with fingerprint allowlist — --tls-cert + --tls-key switches serve to TLS; adding --mtls-allowlist <path> enforces client-cert mTLS where every connection's peer must present a cert whose fingerprint is on the allowlist. Quorum POSTs use --quorum-client-cert / --quorum-client-key / --quorum-ca-cert for the outbound side.
Federated governance — broadcast_namespace_meta_quorum (src/federation.rs:1007-1075) propagates GovernancePolicy changes to every peer with quorum semantics. A new strict policy on agents/secops set on node-A is enforced on all 4 nodes within seconds, with the same W-of-N durability as memory writes.
Federated pending decisions — broadcast_pending_decision_quorum (src/federation.rs:918-995) means an approve/reject on one node turns into a committed (or audited) state on every peer with quorum-bounded latency.
Per-node governance gate — Each node enforces its local policy at write time. The federated metadata keeps the policy itself consistent across nodes.

This is real, durable, multi-node consistency for a knowledge mesh.

What's still roadmap

Honest about the remaining gaps.

Strong consensus on the governance plane — Quorum writes give bounded staleness, not strong consistency. Two operators on opposite sides of a partition can each approve a pending action; reconciliation is last-decision-wins on rejoin. Distributed consensus (Raft/Paxos) is v1.0+ territory and only makes sense for the governance plane, not the memory write path.
Cryptographic attestation — Agent identity is claimed (metadata.agent_id from clap or AI_MEMORY_AGENT_ID env), not signed. The signature column on memory_links (v0.6.3 schema v15) is reserved for the v0.7 attestation track that signs every link with the originating agent's keypair.
Postgres + pgvector backbone — Compiles today behind --features sal-postgres; correctness fixes shipped in v0.6.x (#294-#297). Performance maturation and GA targeted v0.7.
CRDT-lite link tombstones — Delete-link fanout is deferred; the CHANGELOG (#325) tracks it for v0.7. Local DELETE /api/v1/links works today.

If your fleet needs strong cross-region governance consensus, that's v1.0+. Everything else listed is shipping.

Walkthrough

What's actually happening.

Write path on node-A

Agent a1 on node-A calls memory_store.
Local governance gate runs against the federated policy. Allow → continue, Pending → queue, Deny → reject.
Local INSERT succeeds. WAL fsynced. Vector clock bumped.
broadcast_store_quorum spawns one HTTP POST /api/v1/sync/push per configured peer (B, C, D). Local response returns to the agent immediately.
Each peer receives the push, validates, applies, and bumps its own SyncState entry for node-A.

A peer rejoins after a partition

Node-C drops off the cluster for 4 hours. When it comes back, its supervisor calls:

curl http://node-a:9077/api/v1/sync/since?peer=node-c&clock=7350

Node-A returns the delta — every memory, link, namespace_meta change, and pending decision that happened after clock 7350. Node-C applies them in causal order, bumps its own state, and re-enters the steady-state mesh.

Federated governance — a real example

# On node-A: tighten policy on a sensitive namespace
ai-memory --bind node-a:9077 namespace set-standard org/legal/contracts \
  --governance '{"write":"approve","delete":"approve"}'

Within seconds:

namespace_meta row updated on node-A.
broadcast_namespace_meta_quorum fires POST /api/v1/sync/push to B, C, D.
Each peer applies the new policy.
Any agent on any node attempting to write into org/legal/contracts now hits the local governance gate, which queues the write as Pending.
An approver agent on any node calls memory_pending_approve. The decision broadcasts via broadcast_pending_decision_quorum. All four nodes commit (or audit-reject) the queued write coherently.

That's the honest version of "federated governance" — eventually consistent, but coherent.

Deployment recipe

Quorum writes + mTLS peer mesh.

All real flags, all in v0.6.3:

# node-a — quorum writes require 2-of-3 peer acks within 2s
ai-memory --db /var/lib/ai-memory/store.db serve \
  --bind 0.0.0.0:9077 \
  --tier semantic \
  --quorum-writes 2 \
  --quorum-peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
  --quorum-timeout-ms 2000 \
  --tls-cert /etc/ai-memory/tls.crt \
  --tls-key /etc/ai-memory/tls.key \
  --mtls-allowlist /etc/ai-memory/peer-fingerprints.txt \
  --quorum-client-cert /etc/ai-memory/client.crt \
  --quorum-client-key /etc/ai-memory/client.key \
  --quorum-ca-cert /etc/ai-memory/ca.crt \
  --catchup-interval-secs 30

The peer-fingerprints.txt file is a newline-delimited list of SHA-256 fingerprints (with or without : separators; comments start with #). serve refuses any peer whose cert fingerprint is not on the list — that's the peer-mesh identity gate.

For long-running pull-based reconciliation, run a sync-daemon alongside serve:

ai-memory sync-daemon \
  --peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
  --interval 2 \
  --client-cert /etc/ai-memory/client.crt \
  --client-key /etc/ai-memory/client.key

Dimension	T3 ceiling	When it bites
Cluster size	~10 nodes before fanout latency dominates	Beyond that, walk to T4
Concurrent writes per node	~10–20 (T2 ceiling, multiplied)	Each node is independently bottlenecked by its mutex
Write durability	W-of-N quorum-writes when `--quorum-writes >= 1`; local-WAL-only when 0	Choose your contract per deployment
Cross-node consistency	Eventual; vector clocks resolve order; ties break last-writer-wins	Use `memory_detect_contradiction` to surface drift
Vector index	Per-node, independent	Each node holds its own HNSW; embedding cost paid N times
TLS	mTLS supported, not enforced by default	Enforce before joining real peers

Source

Source-of-truth references.

src/main.rs:405-447 — quorum-write CLI flags (--quorum-writes, --quorum-peers, --quorum-timeout-ms, --quorum-client-cert/-key/-ca-cert, --catchup-interval-secs)
src/main.rs:380-393 — TLS / mTLS allowlist flags (--tls-cert, --tls-key, --mtls-allowlist)
src/handlers.rs:442-454 — broadcast_store_quorum + finalise_quorum ack-counting in the write path
src/federation.rs — 10 broadcast functions: broadcast_store_quorum, broadcast_delete_quorum, broadcast_archive_quorum, broadcast_restore_quorum, broadcast_link_quorum, broadcast_consolidate_quorum, broadcast_pending_quorum, broadcast_pending_decision_quorum, broadcast_namespace_meta_quorum, broadcast_namespace_meta_clear_quorum
src/replication.rs (422 lines) — QuorumWriter + AckTracker implementation
src/handlers.rs — POST /api/v1/sync/push ingress, GET /api/v1/sync/since causal catch-up
docs/ADR-0001-quorum-replication.md — the realised design
CHANGELOG.md — #325 link fanout, #326 consolidate fanout, #327 embedder readiness

Tier 3 — Multi-node cluster

4 nodes · 5 agents each · sync_push mesh.