SHIPS TODAY · v0.6.3

Tier 3 — Multi-node cluster

Twenty agents. Four nodes. One coherent knowledge mesh — with W-of-N quorum writes shipping today. Each node runs its own ai-memory + SQLite; writes, links, governance, and pending decisions fan out via sync_push with vector-clock causality.

4 nodes · 20 agents W-of-N quorum mTLS allowlist federated governance

This is the first distributed tier. Each node runs an ai-memory process backed by its own SQLite. Writes, links, archives, namespace metadata, and governance decisions fan out to every peer over POST /api/v1/sync/push. Vector-clock causality (sync/since) lets a peer that drops off the network catch up cleanly. The recall pipeline stays local on each node — fast hot reads, no cross-node round trips.

The big upgrade in current builds: quorum-write contract is wired. Set --quorum-writes N --quorum-peers a,b,c and every HTTP write fans out to every peer and only returns OK after the local commit + W-1 peer acks land within --quorum-timeout-ms. Falling short returns HTTP 503 quorum_not_met. This is real durability, not best-effort.

Architecture diagram

4 nodes · 5 agents each · sync_push mesh.

T3 · 4 nodes · 5 agents each · sync_push mesh
Federated governance namespace_meta fanout pending_decision sync node-a 10.0.1.10:9077 vector clock: 7421 SQLite + HNSW policy gate a1 a2 a3 a4 a5 5 agents · namespace-isolated node-b 10.0.1.11:9077 vector clock: 7421 SQLite + HNSW policy gate a1 a2 a3 a4 a5 5 agents · namespace-isolated node-c 10.0.1.12:9077 vector clock: 7421 SQLite + HNSW policy gate a1 a2 a3 a4 a5 5 agents · namespace-isolated node-d 10.0.1.13:9077 vector clock: 7421 SQLite + HNSW policy gate a1 a2 a3 a4 a5 5 agents · namespace-isolated 12-edge full peer mesh · sync_push fanout · vector-clock causality (sync/since)
sync_push (memories · links · namespace_meta · pending decisions) per-node governance gate local recall on each node
Each node holds its own SQLite + HNSW. Writes commit locally then fan out to peers. Recalls stay local. Governance policies, namespace metadata, and pending-action decisions propagate the same way memories do.
What ships today

The mesh is real. Source-cited.

This is real, durable, multi-node consistency for a knowledge mesh.

What's still roadmap

Honest about the remaining gaps.

If your fleet needs strong cross-region governance consensus, that's v1.0+. Everything else listed is shipping.

Walkthrough

What's actually happening.

Write path on node-A

  1. Agent a1 on node-A calls memory_store.
  2. Local governance gate runs against the federated policy. Allow → continue, Pending → queue, Deny → reject.
  3. Local INSERT succeeds. WAL fsynced. Vector clock bumped.
  4. broadcast_store_quorum spawns one HTTP POST /api/v1/sync/push per configured peer (B, C, D). Local response returns to the agent immediately.
  5. Each peer receives the push, validates, applies, and bumps its own SyncState entry for node-A.

A peer rejoins after a partition

Node-C drops off the cluster for 4 hours. When it comes back, its supervisor calls:

curl http://node-a:9077/api/v1/sync/since?peer=node-c&clock=7350

Node-A returns the delta — every memory, link, namespace_meta change, and pending decision that happened after clock 7350. Node-C applies them in causal order, bumps its own state, and re-enters the steady-state mesh.

Federated governance — a real example

# On node-A: tighten policy on a sensitive namespace
ai-memory --bind node-a:9077 namespace set-standard org/legal/contracts \
  --governance '{"write":"approve","delete":"approve"}'

Within seconds:

That's the honest version of "federated governance" — eventually consistent, but coherent.

Deployment recipe

Quorum writes + mTLS peer mesh.

All real flags, all in v0.6.3:

# node-a — quorum writes require 2-of-3 peer acks within 2s
ai-memory --db /var/lib/ai-memory/store.db serve \
  --bind 0.0.0.0:9077 \
  --tier semantic \
  --quorum-writes 2 \
  --quorum-peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
  --quorum-timeout-ms 2000 \
  --tls-cert /etc/ai-memory/tls.crt \
  --tls-key /etc/ai-memory/tls.key \
  --mtls-allowlist /etc/ai-memory/peer-fingerprints.txt \
  --quorum-client-cert /etc/ai-memory/client.crt \
  --quorum-client-key /etc/ai-memory/client.key \
  --quorum-ca-cert /etc/ai-memory/ca.crt \
  --catchup-interval-secs 30

The peer-fingerprints.txt file is a newline-delimited list of SHA-256 fingerprints (with or without : separators; comments start with #). serve refuses any peer whose cert fingerprint is not on the list — that's the peer-mesh identity gate.

For long-running pull-based reconciliation, run a sync-daemon alongside serve:

ai-memory sync-daemon \
  --peers https://node-b:9077,https://node-c:9077,https://node-d:9077 \
  --interval 2 \
  --client-cert /etc/ai-memory/client.crt \
  --client-key /etc/ai-memory/client.key
Wiring

Governance, skills, and attestations at T3.

Limits

Honest ceilings.

DimensionT3 ceilingWhen it bites
Cluster size~10 nodes before fanout latency dominatesBeyond that, walk to T4
Concurrent writes per node~10–20 (T2 ceiling, multiplied)Each node is independently bottlenecked by its mutex
Write durabilityW-of-N quorum-writes when --quorum-writes >= 1; local-WAL-only when 0Choose your contract per deployment
Cross-node consistencyEventual; vector clocks resolve order; ties break last-writer-winsUse memory_detect_contradiction to surface drift
Vector indexPer-node, independentEach node holds its own HNSW; embedding cost paid N times
TLSmTLS supported, not enforced by defaultEnforce before joining real peers
Source

Source-of-truth references.