Twenty agents. Four nodes. One coherent knowledge mesh — with W-of-N quorum writes shipping today. Each node runs its own ai-memory + SQLite; writes, links, governance, and pending decisions fan out via sync_push with vector-clock causality.
This is the first distributed tier. Each node runs an ai-memory process backed by its own SQLite. Writes, links, archives, namespace metadata, and governance decisions fan out to every peer over POST /api/v1/sync/push. Vector-clock causality (sync/since) lets a peer that drops off the network catch up cleanly. The recall pipeline stays local on each node — fast hot reads, no cross-node round trips.
The big upgrade in current builds: quorum-write contract is wired. Set --quorum-writes N --quorum-peers a,b,c and every HTTP write fans out to every peer and only returns OK after the local commit + W-1 peer acks land within --quorum-timeout-ms. Falling short returns HTTP 503 quorum_not_met. This is real durability, not best-effort.
src/main.rs lines 405-411: --quorum-writes N --quorum-peers <list> is wired into the write path. src/handlers.rs:442-454 calls broadcast_store_quorum then finalise_quorum to count peer acks; 503 quorum_not_met returns when the deadline (--quorum-timeout-ms, default 2000) lapses without W-1 acks. ADR-0001 is realised, not aspirational. Same wiring applies to broadcast_link_quorum, broadcast_consolidate_quorum, broadcast_pending_decision_quorum, broadcast_namespace_meta_quorum.sync_push fanout for everything — src/federation.rs ships 10 broadcast functions: store, delete, archive, restore, link, consolidate, pending action, pending decision, namespace metadata, namespace metadata clear. Each is invoked from the corresponding handler in handlers.rs after the local commit lands.SyncState. A peer that drops off polls GET /api/v1/sync/since?peer=node-a&clock=7400 and gets only the deltas after that point. The serve daemon also runs a --catchup-interval-secs loop (default 30s) that pulls peers proactively for any updates missed during partition.--tls-cert + --tls-key switches serve to TLS; adding --mtls-allowlist <path> enforces client-cert mTLS where every connection's peer must present a cert whose fingerprint is on the allowlist. Quorum POSTs use --quorum-client-cert / --quorum-client-key / --quorum-ca-cert for the outbound side.broadcast_namespace_meta_quorum (src/federation.rs:1007-1075) propagates GovernancePolicy changes to every peer with quorum semantics. A new strict policy on agents/secops set on node-A is enforced on all 4 nodes within seconds, with the same W-of-N durability as memory writes.broadcast_pending_decision_quorum (src/federation.rs:918-995) means an approve/reject on one node turns into a committed (or audited) state on every peer with quorum-bounded latency.This is real, durable, multi-node consistency for a knowledge mesh.
metadata.agent_id from clap or AI_MEMORY_AGENT_ID env), not signed. The signature column on memory_links (v0.6.3 schema v15) is reserved for the v0.7 attestation track that signs every link with the originating agent's keypair.--features sal-postgres; correctness fixes shipped in v0.6.x (#294-#297). Performance maturation and GA targeted v0.7.DELETE /api/v1/links works today.If your fleet needs strong cross-region governance consensus, that's v1.0+. Everything else listed is shipping.
a1 on node-A calls memory_store.Allow → continue, Pending → queue, Deny → reject.INSERT succeeds. WAL fsynced. Vector clock bumped.broadcast_store_quorum spawns one HTTP POST /api/v1/sync/push per configured peer (B, C, D). Local response returns to the agent immediately.Node-C drops off the cluster for 4 hours. When it comes back, its supervisor calls:
curl http://node-a:9077/api/v1/sync/since?peer=node-c&clock=7350
Node-A returns the delta — every memory, link, namespace_meta change, and pending decision that happened after clock 7350. Node-C applies them in causal order, bumps its own state, and re-enters the steady-state mesh.
# On node-A: tighten policy on a sensitive namespace
ai-memory --bind node-a:9077 namespace set-standard org/legal/contracts \
--governance '{"write":"approve","delete":"approve"}'
Within seconds:
namespace_meta row updated on node-A.broadcast_namespace_meta_quorum fires POST /api/v1/sync/push to B, C, D.org/legal/contracts now hits the local governance gate, which queues the write as Pending.memory_pending_approve. The decision broadcasts via broadcast_pending_decision_quorum. All four nodes commit (or audit-reject) the queued write coherently.That's the honest version of "federated governance" — eventually consistent, but coherent.
All real flags, all in v0.6.3:
# node-a — quorum writes require 2-of-3 peer acks within 2s
ai-memory --db /var/lib/ai-memory/store.db serve \
--bind 0.0.0.0:9077 \
--tier semantic \
--quorum-writes 2 \
--quorum-peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
--quorum-timeout-ms 2000 \
--tls-cert /etc/ai-memory/tls.crt \
--tls-key /etc/ai-memory/tls.key \
--mtls-allowlist /etc/ai-memory/peer-fingerprints.txt \
--quorum-client-cert /etc/ai-memory/client.crt \
--quorum-client-key /etc/ai-memory/client.key \
--quorum-ca-cert /etc/ai-memory/ca.crt \
--catchup-interval-secs 30
The peer-fingerprints.txt file is a newline-delimited list of SHA-256 fingerprints (with or without : separators; comments start with #). serve refuses any peer whose cert fingerprint is not on the list — that's the peer-mesh identity gate.
For long-running pull-based reconciliation, run a sync-daemon alongside serve:
ai-memory sync-daemon \
--peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
--interval 2 \
--client-cert /etc/ai-memory/client.crt \
--client-key /etc/ai-memory/client.key
memory_agent_register) get the same scope visibility on every node. Auto-tagging on node-A produces tags that are stored as memory metadata and replicated to all peers — every node sees the same enriched view.| Dimension | T3 ceiling | When it bites |
|---|---|---|
| Cluster size | ~10 nodes before fanout latency dominates | Beyond that, walk to T4 |
| Concurrent writes per node | ~10–20 (T2 ceiling, multiplied) | Each node is independently bottlenecked by its mutex |
| Write durability | W-of-N quorum-writes when --quorum-writes >= 1; local-WAL-only when 0 | Choose your contract per deployment |
| Cross-node consistency | Eventual; vector clocks resolve order; ties break last-writer-wins | Use memory_detect_contradiction to surface drift |
| Vector index | Per-node, independent | Each node holds its own HNSW; embedding cost paid N times |
| TLS | mTLS supported, not enforced by default | Enforce before joining real peers |
src/main.rs:405-447 — quorum-write CLI flags (--quorum-writes, --quorum-peers, --quorum-timeout-ms, --quorum-client-cert/-key/-ca-cert, --catchup-interval-secs)src/main.rs:380-393 — TLS / mTLS allowlist flags (--tls-cert, --tls-key, --mtls-allowlist)src/handlers.rs:442-454 — broadcast_store_quorum + finalise_quorum ack-counting in the write pathsrc/federation.rs — 10 broadcast functions: broadcast_store_quorum, broadcast_delete_quorum, broadcast_archive_quorum, broadcast_restore_quorum, broadcast_link_quorum, broadcast_consolidate_quorum, broadcast_pending_quorum, broadcast_pending_decision_quorum, broadcast_namespace_meta_quorum, broadcast_namespace_meta_clear_quorumsrc/replication.rs (422 lines) — QuorumWriter + AckTracker implementationsrc/handlers.rs — POST /api/v1/sync/push ingress, GET /api/v1/sync/since causal catch-updocs/ADR-0001-quorum-replication.md — the realised designCHANGELOG.md — #325 link fanout, #326 consolidate fanout, #327 embedder readiness