The Grand Slam.

The Secure Enterprise Federated Reference Architecture — a 3-region, Batman-active AI Agent Hive. Three geographic regions (nyc3 · fra1 · sgp1), 15 nodes (9 peers + 3 regional PG + 3 NHI agents), 9 federated peers in one W=2 synchronous quorum + eventual cross-region convergence mesh, three distinct encryption legs — each proven with a positive and a negative test. PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2.

3 regions 15 nodes · 9 peers + 3 PG + 3 agents W=2 sync + eventual convergence 3 encryption legs Batman MAXIMUM-SECURE do-1461
At a glance

One hive, three regions, three encrypted legs.

Each region is a self-contained substrate cluster in its own private VPC of five nodes: one regional PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2 node, three ai-memory daemon peers, and one NHI agent (an xAI grok-4.3 client) — 5 × 3 regions = 15 nodes total (9 peers + 3 PG + 3 agents). A region's peers bind to their regional Postgres over the private VPC under sslmode=verify-full. All nine peers — across all three regions — federate into one cross-region write-quorum mesh over public IPs, secured by mTLS, per-message Ed25519 signing, nonce anti-replay, and CA-rooted zero-touch peer enrollment. The synchronous write quorum is W=2 (local commit + one cross-region remote ack) and the federation is primarily eventual: a write commits locally, attests, returns OK after one remote ack, then async catch-up converges it to every peer in every region (the harness asserts full cross-region convergence). The three NHI agents are pure mTLS clients (not federation-mesh members) exercising the a2a + ai_nhi test groups over Leg-1. Every node runs the Batman-active MAXIMUM-SECURE posture. This topology was verified identically across both independent clean-room reproducibility rounds (same 100% GREEN results on fresh fleets).

3
Regions
15
Nodes (9 peers + 3 PG + 3 agents)
9
Federated peers
W=2
Sync quorum + eventual convergence
3
Encryption legs
v57
Schema (lockstep)
The hive

3-region federated topology.

Three region clusters, each a 5-node unit: its Postgres substrate (verify-full TLS, leg 3), three Batman-shielded peers, and one NHI agent. The nine peers form one cross-region quorum mesh (leg 2) with a W=2 synchronous quorum and eventual cross-region convergence. The three NHI agents are pure mTLS clients — they reach their regional peer over API mTLS (leg 1, a thinner client link), not the peer-mesh trunk. Each leg is a distinctly-coloured, animated encrypted flow; secured nodes carry a Batman shield-lock glyph.

3-region federated AI Agent Hive topology diagram Three region clusters — nyc3 (US-East), fra1 (EU-Central), and sgp1 (Asia-SE) — arranged in a triangle. Each cluster is a 5-node unit containing one regional PostgreSQL 18.4 / Apache AGE / pgvector node, three ai-memory peer daemons, and one NHI agent (an xAI grok-4.3 client), for 15 nodes total (9 peers + 3 PG + 3 agents). Within each region, the three peers connect to their regional Postgres over private-VPC TLS verify-full (leg 3, amber). All nine peers across the three regions are interconnected by a cross-region quorum mesh using a W=2 synchronous write quorum (local commit plus one cross-region remote ack) with eventual async convergence to every peer in every region (leg 2, violet). The NHI agent in each region is a pure mTLS client that connects to its regional peer over API mutual-TLS on port 9077 (leg 1, cyan) — a client link, not a member of the quorum mesh. Every peer node displays a Batman shield-lock glyph indicating the MAXIMUM-SECURE posture. W=2 sync W=2 sync quorum + eventual cross-region convergence · mTLS + Ed25519 + nonce nyc3 · US-East private VPC · regional substrate PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2 :5432 hostssl verify-full peer-nyc3-01 peer-nyc3-02 peer-nyc3-03 agent-nyc3-01 NHI · grok-4.3 mTLS client fra1 · EU-Central private VPC · regional substrate PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2 :5432 hostssl verify-full peer-fra1-01 peer-fra1-02 peer-fra1-03 agent-fra1-01 NHI · grok-4.3 mTLS client sgp1 · Asia-SE private VPC · regional substrate PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2 :5432 hostssl verify-full peer-sgp1-01 peer-sgp1-02 peer-sgp1-03 agent-sgp1-01 NHI · grok-4.3 mTLS client External client mTLS client-cert + x-api-key leg 1 · API mTLS :9077
Leg 1 — API mTLS (client ↔ peer :9077) Leg 2 — Federation / quorum mTLS (peer ↔ peer, W=2 sync + eventual convergence) Leg 3 — daemon → Postgres TLS (verify-full :5432) NHI agent client link (Leg-1, mTLS client role) Batman-secured node
The centerpiece

Three encryption legs, each proven both ways.

Every byte on the fleet rides one of three distinct, independently-verified encrypted legs. Each leg is proven with a positive test (the legitimate path succeeds) and a negative test (the illegitimate path is refused before it can do harm). The result cells below are filled from the live make validate && make test run plus the focused test/encrypted_legs.sh suite (67/67 checks GREEN), verified across two independent 0→60 runs — they are not fabricated.

The three encryption legs, in sequence Left to right: an external client connects to a peer over leg 1 (API mutual-TLS on port 9077, cyan); the peer connects to other peers over leg 2 (federation quorum mutual-TLS with Ed25519 message signing and nonce anti-replay, violet); the peer connects to its regional PostgreSQL over leg 3 (TLS verify-full on port 5432, amber). Each connection is an animated encrypted flow. Client cert + api-key ai-memory peer Batman MAXIMUM :9077 rustls client-auth peer (other region) W=2 sync quorum Regional PG 18.4 AGE 1.7.0 · pgvector 0.8.2 :5432 scram-sha-256 leg 1 · API mTLS leg 2 · quorum mTLS + Ed25519 leg 3 · daemon → PG verify-full
LEG 1 · API mTLS
client ↔ peer · HTTPS :9077

Mutual-TLS API surface

The peer HTTPS port is client_auth_mandatory: rustls client-auth + a SHA-256 client-cert fingerprint allowlist (the SSH known_hosts trust model), layered with an x-api-key on privileged routes.

PASSallowlisted cert + api-key → 200 PASS — 200 with client-cert + api-key
DENYno / rogue client cert → connection refused PASS — no/rogue client cert → refused (000)
DENYvalid cert, no api-key → 401 PASS — valid cert, no api-key → 401
LEG 2 · Federation / quorum mTLS
peer ↔ peer · cross-region :9077

Quorum mesh (W=2 sync + eventual convergence)

Cross-region /sync push presents the node's mTLS client cert, verifies peer server certs vs the campaign CA, and carries X-Memory-Cred + an X-Memory-Sig Ed25519 signature bound to a fresh nonce. A write commits locally and attests, then returns OK after a W=2 synchronous quorum (local commit + one cross-region remote ack); async catch-up then converges it to every other peer across all three regions. The federation is primarily eventual — a cross-region synchronous majority is deliberately avoided because any slow or down peer would turn writes into 503s, conflating "durable enough to ack" with "converged everywhere"; W=2 plus eventual convergence is the correct 3-region model.

PASSenrolled peer write → converges on 8/8 peers PASS — attested W=2 quorum write converges to all 8 other peers across all 3 regions (TLS+mTLS)
DENYmissing signature → 401 PASS — /sync/push missing sig → 401
DENYreplay → 401 nonce-replay PASS — forged sig+nonce refused twice (401/401), nonce gate live
DENYunenrolled peer → 401 peer_not_enrolled PASS — unenrolled X-Peer-Id → 401 peer_not_enrolled
LEG 3 · daemon → Postgres TLS
peer → regional PG · :5432

verify-full to regional substrate

Each peer dials its own region's Postgres over the private VPC under sslmode=verify-full. The server is hostssl-only in pg_hba with scram-sha-256 auth; the server cert SAN pins the pg node's private VPC IP so hostname verification passes east-west.

PASSverify-full connect → ≥1 ssl, 0 plaintext PASS — daemon→PG verify-full, all 3 regions TLSv1.3 (TLS_AES_256_GCM_SHA384), ssl=N plain=0
DENYsslmode=disable → refused pre-auth PASS — sslmode=disable refused pre-auth (hostssl-only pg_hba)

Source: test/encrypted_legs.sh + the crypto / federation / zerotouch test groups in deploy/do-1461/test/run.sh. The canonical green reports (JSON + TSV) are regenerated under .local-runs/do-1461/reports/ from a clean 0→60 run of this 3-region PG18.4 fleet.

Defense battery

Batman-active MAXIMUM-SECURE posture.

On top of the transport, every node runs the Batman-active MAXIMUM-SECURE posture (provision/46_batman.sh). This is the secure-default env battery + Form-7 governance activation + the Form-5 confidence curator, asserted live over the wire by the nsa_gaps test group. For the single-node activation recipe and the full framing of the seven Batman forms, see the Batman Mode atlas.

Control (env / form)EffectLive wire testResult
AI_MEMORY_REQUIRE_AGENT_ATTESTATION=1 Every store write must be agent-attested; an unsigned write is refused. unsigned write → 403 ATTESTATION_FAILED PASS — unsigned write → 403 ATTESTATION_FAILED
AI_MEMORY_FED_REQUIRE_SIG=1 /sync/push requires a valid per-message Ed25519 signature. missing / invalid signature → 401 PASS — 401 on missing / invalid X-Memory-Sig
AI_MEMORY_FED_REQUIRE_NONCE=1 Per-message nonce freshness; byte-for-byte replays are rejected. forged sig+nonce push refused on repeat → 401 PASS — 401 on nonce replay
AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT=1 Receivers fail closed on any unenrolled peer-id (zero-touch CA trust). unenrolled peer on /sync/since401 peer_not_enrolled PASS — 401 peer_not_enrolled for unenrolled peers
AI_MEMORY_PERMISSIONS_MODE=enforce K3/K9 governance gate enforced (not advisory). admin endpoint as non-admin → 403 PASS — PERMISSIONS_MODE=enforce live
AI_MEMORY_GOVERNANCE_FAIL_OPEN_ON_ERROR=0 Governance fails closed — a rule-consultation error blocks the write. fail-closed posture asserted in capabilities envelope PASS — GOVERNANCE_FAIL_OPEN_ON_ERROR=0 (fail-closed)
Form 5 — confidence (AUTO / SHADOW / DECAY) Auto-confidence calibration, shadow-mode scoring, freshness decay; curator sweeps on every peer. curator daemon active + decay sweep observed PASS — AUTO_CONFIDENCE / SHADOW / DECAY set + curator daemon running on all 9 peers
Form 2 / Form 6 — namespace policy Synchronous atomise-before-embed (Form 2) + MemoryKind auto-classify (Form 6) via namespace standard. namespace standard present; auto-classify backfill observed PASS — namespace batman-policy standard bound (Form 2 sync-atomise + Form 6 auto-classify)
Form 7 — signed rules R001–R004 Operator-signed governance seed rules (sqlite-substrate-scoped). On the postgres peers of this hive the live governance is the env battery above. rules list → 4 enabled, attest_level=operator_signed (sqlite substrate) PARTIAL — Form-7 signed-rules (R001–R004) are sqlite-substrate-scoped; on postgres peers the env-battery controls above are the live governance (tracked #1536, out-of-NSA-scope)
V-4 signed-events hash chain Append-only, tamper-evident cross-row SHA-256 audit chain. per-peer verify-signed-events-chain exits 0 PASS — MCP/L4 signed_events tamper-evident chain verified (the postgres normal-write append is a separate storage-layer item #1542, out-of-NSA-scope)
NSA CSI MCP · observable-test matrix

Every concern + recommendation → a live-hive test.

This maps each NSA CSI MCP concern (a–j) and recommendation (a–g) — from the full NSA mapping — to a concrete observable test on this live hive — each mapped to an MCP-interface observable (the NSA CSI MCP guidance applies to the MCP interface/protocol, not to the postgres connections or storage backend). Each row's result column is filled from the live run, verified across two independent 0→60 runs; nothing here is invented pass/fail data. See the NSA non-endorsement notice in the footer.

Concerns (a–j)

#NSA concernObservable live-hive testResult
aAccess controlprivate-scope owner visibility (private memory invisible to a different caller); admin endpoint as non-admin → 403; namespace isolation roundtripPASS — private-scope memory invisible to a different MCP caller; admin endpoint as non-admin → 403; namespace-isolation roundtrip holds
bInsecure context / data serializationAccept-Provenance: verbose returns typed citations / source_uri / source_span; malformed payload rejected by RequestValidatorPASS — Accept-Provenance: verbose returns typed citations / source_uri / source_span; malformed payload rejected by RequestValidator at the MCP boundary
cPoor approval workflowspending-actions surface present; HMAC-mandatory approval dispatch (unsigned refused)PASS — pending-actions surface present; HMAC-mandatory approval dispatch (unsigned refused)
dToken / session securityleg-1 mTLS + x-api-key enforced; leg-2 Ed25519 sig + nonce anti-replay (replay → 401)PASS — leg-1 mTLS + x-api-key enforced; leg-2 Ed25519 sig + nonce anti-replay (replay → 401)
eMisconfigurations / poor implementationfail-CLOSED secure defaults asserted live (sig / nonce / enrollment / permissions / governance)PASS — fail-CLOSED secure defaults asserted live (sig / nonce / enrollment / permissions / governance)
fInconsistent behaviorsschema v57 lockstep across all peers; optimistic-concurrency version conflict → 409PASS — schema v57 lockstep across all peers; optimistic-concurrency version conflict → 409
gPoor / missing audit logsper-peer V-4 verify-signed-events-chain exits 0; recall-observation ledger presentPASS — per-peer V-4 verify-signed-events-chain (MCP/L4) exits 0; recall-observation ledger present
hDenial of service / fatigueper-agent K8 quota surface; 2 MB body cap; federation DLQ boundedPASS — per-agent K8 quota surface; 2 MB body cap; federation DLQ bounded
iTool parameter injectionRequestValidator rejects malformed parameters at the wire boundaryPASS — RequestValidator rejects malformed parameters at the MCP wire boundary
jTool invocation path confusionMCP initialize returns daemon-Ed25519-signed serverInfo identity block (TOFU)PASS — MCP initialize returns daemon-Ed25519-signed serverInfo identity block (TOFU)

Recommendations (a–g)

#NSA recommendationObservable live-hive testResult
aChoose supported MCP projectslive /api/v1/health reports pinned version 0.7.0 + schema 57 on every nodePASS — /api/v1/health reports pinned version 0.7.0 + schema 57 on every node
bDesign for boundariesnamespace isolation + per-region VPC substrate boundary + fail-CLOSED defaults verified livePASS — namespace isolation + per-region VPC substrate boundary + fail-CLOSED defaults verified live
cValidate parametersmalformed / out-of-range write rejected with typed validation errorPASS — malformed / out-of-range write rejected with typed RequestValidator error
dConstrain & sandbox tool executionForm-7 governance gate live (R001–R004 enabled); permissions=enforcePASS — governance gate live (permissions=enforce, attestation required, fail-closed) on every MCP write path
eSign & verify MCP messagesleg-2 Ed25519 sig required (missing → 401); V-4 chain verifies; serverInfo signedPASS — leg-2 Ed25519 sig required (missing → 401); V-4 MCP/L4 chain verifies; serverInfo signed at initialize
fFilter & monitor output pipelinesAccept-Provenance: verbose envelope returns citations / ConfidenceTier / MemoryKindPASS — Accept-Provenance: verbose envelope returns citations / ConfidenceTier / MemoryKind
gInstrument for logging & detectionbare /metrics Prometheus surface reachable; federation-convergence probe observablePASS — bare /metrics Prometheus surface reachable; federation-convergence probe observable

Live assertions are driven by the nsa_gaps + crypto + regression test groups against the real TLS+mTLS path. Full per-claim mapping with file:line provenance: compliance/nsa-csi-mcp.html.

Honest limitations

What this posture does not claim.

The prime directive forbids papering over gaps. A mature security posture is transparent about its trust boundaries. The limitations below are split into two clearly-separated buckets: (1) the one genuine honest limitation that lives within NSA CSI MCP scope — the MCP / federation interface — and (2) transparent substrate / storage engineering findings that are tracked separately and are explicitly not NSA CSI MCP compliance gaps. The NSA CSI MCP guidance applies to the MCP interface/protocol, not to the postgres connections or storage backend. The full companion is honest-limitations.md.

1 · Within NSA CSI MCP scope (the MCP / federation interface)

Federation-receive is claimed-by-default at the MCP/federation boundary A compromised peer holding a valid mTLS client cert can push under any agent_id; the receiving side trusts the envelope-attributed sender. mTLS + the peer allowlist + zero-touch CA enrollment are the trust boundary here — they bound which machines can speak (only enrolled, CA-rooted peers), not which agent_id each asserts on a received write. Per-write federation-receive attestation (verifying the asserting agent's own signature on every received write) is a v0.8 item, tracked in #1464. This is the single honest limitation that falls inside the MCP CSI surface, and it is disclosed rather than silently carried.

2 · Outside NSA CSI MCP scope (substrate / storage findings — not compliance gaps)

These are transparent engineering findings on the substrate / storage layer, tracked separately. They are not NSA CSI MCP limitations — the MCP interface, federation protocol, and the three encryption legs (including Leg-3 daemon→Postgres TLS) all pass at the MCP surface. They are listed here for full disclosure.

#1536 — Form-7 signed rules are sqlite-substrate-scoped The Batman Form-7 R001–R004 signed filesystem/process rules are scoped to the sqlite substrate. On the postgres peers of this hive the live governance is the env battery (attestation, sig, nonce, enrollment, permissions=enforce, fail-closed). A storage-substrate scoping item, not an MCP-interface gap. #1536.
#1539 — no HTTP pubkey-bind route There is no HTTP route to bind a public key over the API surface yet; key binding is handled out-of-band by the zero-touch CA enrollment path. An API-surface convenience item, not an MCP CSI gap. #1539.
#1541 — closed / not-an-issue (postgres signed_events) Investigated and closed: postgres peer-to-peer comms are TLS via Leg-3 (daemon→PG verify-full, TLSv1.3). No storage-layer plaintext path exists. Resolved — carried here only for audit transparency. #1541.
#1542 — PostgresStore::link_signed is a no-op (KG-on-postgres gap) The signed-link write is currently a no-op on the postgres adapter, leaving a knowledge-graph-on-postgres gap. The MCP/L4 signed_events tamper-evident chain itself verifies (see the V-4 row above); this is a separate storage-layer adapter item. #1542.
#1543 — source_uri not projected on postgres read The source_uri provenance field is written but not projected back on the postgres read path. A storage-adapter projection fix; the verbose-provenance envelope at the MCP surface is otherwise complete. #1543.
#1544 — bulk ingest saturates per-agent quota (config / scale) A large bulk ingest can saturate the per-agent K8 quota; this is a configuration / scale-tuning concern (operators raise the per-agent quota for bulk-ingest agents), not a security boundary. #1544.

These findings are framed honestly per the substrate's honesty discipline (the v0.6.3.1 capabilities-v2 honesty floor). Bucket 1 scopes the MCP/federation interface's residual risk precisely; bucket 2 is transparent substrate engineering tracked separately and explicitly out of NSA CSI MCP scope.

Reproducibility

Stand the whole hive up from one directory.

Everything a reviewer needs to reproduce both the environment and the results lives in deploy/do-1461/ and ships inside release/v0.7.0. Terraform stands the infrastructure up (3 regions, one VPC per region, tag-based firewall); a push-based toolkit brings every node to a verified Batman-active state; a harness proves it.

# one directory, one deterministic 0→60 flow make seed up provision validate test # build, prove, full-spectrum test make down # tear it all down
Terraform

3-region infra

One private VPC per region (regional by DO design), tag-based firewall, role droplets, deterministic outputs. inventory.json is a pure projection of TF state — the whole toolkit drives off it.

Push-based provision

00 → 50 + 46_batman

Deterministic, idempotent SSH steps: 00 inventory → 05 wait-ssh → 10 golden binary → 15 TLS (before PG) → 20 PG/AGE → 25 Ollama embed → 30 config → 45 zero-touch → 46_batman50 federation.

Pinned stack

single-source constants

PostgreSQL 18.4-1.pgdg24.04+1 · Apache AGE 1.7.0 · pgvector 0.8.2 · schema v57 · golden binary sha256-asserted · nomic-embed-text 768-dim CPU embedder. Installed natively — no Docker anywhere on the fleet.

What "reproducible" means here Pinned artifacts (golden binary sha256 / version 0.7.0 / schema v57 / pinned pgdg .debs / pinned Ollama release) are single-source named constants in provision/lib.sh, overridable by env for forks. The seed corpus is pinned by sha256 in CORPUS_MANIFEST.json; every tunable knob is a named constant, not a magic literal. The campaign CA + per-node keys are generated once and reused on re-runs for stable trust. make validate exercises the live fleet over the real TLS+mTLS path and emits a JSON + tabular report under .local-runs/do-1461/reports/; the committed baseline artifact set is the attested Atlas Corpus baseline under deploy/do-1461/atlas/results/. Reference: deploy/do-1461/README.md.
Verified across two independent clean-room 0→60 runs The results on this page are not a single lucky run. The hive was stood up, proven, and torn down across two fully independent clean-room runs: terraform destroy → fresh terraform apply → full push-based provision → complete retest, with the golden-binary sha256, the corpus sha256 (CORPUS_MANIFEST.json), and every tunable knob pinned as named constants on both runs. Each round pins and fleet-asserts its own golden-binary sha256 on all 15 nodes (binary.sha256 in the verify report); both rounds reproduced the same 100% GREEN result set (119/119 verify checks).

DO platform constraint (does not affect reproducibility): on DigitalOcean a region's default VPC cannot be deleted. Teardown therefore destroys 100% of compute and substrate (every droplet, every PostgreSQL/AGE/pgvector node, every peer) while the empty regional VPC container persists and is re-used on the next apply. The container holds no compute, no data, and no trust material between runs, so re-using it does not affect the clean-room property of the 0→60 reproduction.
StepCommandWhat it does
1make seedterraform init + validate (no cloud mutation)
2make upterraform apply → fleet; render inventory.json from TF state
3make provisionpush-based bring-up, steps 0050 (incl. 46_batman)
4make validateverification harness → machine + human report; non-zero on any FAIL
5make testfull-spectrum P3 suite (regression / crypto / federation / zerotouch / a2a / ai_nhi / nsa_gaps / curator)
make downterraform destroy (destructive; 5s abort window)
Fleet manifest

15 nodes, named by function.

Hostnames encode each node's function: do-1461-<function>-<region>-<NN>. Each of the three regions is a 5-node unit — three peers + one PG + one NHI agent — for 15 nodes total (9 peers + 3 PG + 3 agents). Nine peers (3 per region) form the W=2-synchronous-quorum mesh with eventual cross-region convergence; three pg nodes (one per region) each run native PostgreSQL and are never federation members; three NHI agents (one per region, an xAI grok-4.3 client) are pure mTLS clients exercising the a2a + ai_nhi test groups and are never federation members. This 15-node topology was verified identically across both independent clean-room reproducibility rounds.

HostRoleRegionRuns
do-1461-peer-nyc3-01..03peer ×3nyc3federated ai-memory serve + CPU Ollama embedder sidecar
do-1461-pg-nyc3-01pgnyc3regional PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2
do-1461-agent-nyc3-01agentnyc3NHI agent — xAI grok-4.3 mTLS client (Leg-1); exercises a2a + ai_nhi; not a federation member
do-1461-peer-fra1-01..03peer ×3fra1federated ai-memory serve + CPU Ollama embedder sidecar
do-1461-pg-fra1-01pgfra1regional PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2
do-1461-agent-fra1-01agentfra1NHI agent — xAI grok-4.3 mTLS client (Leg-1); exercises a2a + ai_nhi; not a federation member
do-1461-peer-sgp1-01..03peer ×3sgp1federated ai-memory serve + CPU Ollama embedder sidecar
do-1461-pg-sgp1-01pgsgp1regional PostgreSQL 18.4 + Apache AGE 1.7.0 + pgvector 0.8.2
do-1461-agent-sgp1-01agentsgp1NHI agent — xAI grok-4.3 mTLS client (Leg-1); exercises a2a + ai_nhi; not a federation member
Related

Where to read more.