CA-rooted, attestation-issued, short-lived credentials replace O(N²) manual key exchange with O(1) "trust the CA". Scale a federated fleet from 1 to ~1,000,000 AI agents — without touching a single peer to add the next node.
Adding a node reconfigures nobody. A node proves who it is via attestation, a CA mints it a short-lived credential, and receivers verify that credential against a small trust bundle holding only the CA's key. No peer-to-peer .pub copy. No restart fan-out. The trust domain grows by trusting the CA — not by introducing every node to every other node.
This is the shape of SPIFFE/SPIRE (CA-rooted, attestation-issued, short-lived, auto-rotating) but first-party: it composes the Ed25519 sign/verify and canonical-CBOR primitives already in ai-memory. No new dependency — no rcgen, no openssl, no X.509.
Full operator + configuration reference: docs/federation-identity.md. Transport hardening beneath it (mTLS, X-API-Key, peer attestation): docs/federation.md.
Before v0.7.0 a node's federation identity was its hostname, and trust meant copying every peer's Ed25519 .pub onto every other peer. Three hard limits:
Zero-touch trust replaces all three with one move: trust the CA, not the peers.
A FederationCredential is the node's Ed25519 public key bound to its agent-id and a short validity window, signed by a CA key (canonical-CBOR, the same encoder as link signing). It travels base64 next to X-Memory-Sig under x-memory-cred: v1=<base64>.
| Field | Meaning |
|---|---|
subject_agent_id | The node identity (SPIFFE-style path allowed) |
subject_pubkey | The node's Ed25519 verifying key (32 bytes) |
issuer_id | The CA / intermediate that minted it |
not_before / not_after | Unix-seconds validity window (short TTL — default 1h) |
trust_domain | Fleet namespace for multi-tenant isolation |
cred_version | Wire/format version (CRED_VERSION = 1) |
Negotiated, no partition: if a credential header is present and a trust bundle is configured, the receiver takes the credential path; otherwise it falls back to the legacy per-peer .pub path. An un-upgraded peer keeps working; an upgraded peer accepts both. Every phase ships independently and a mixed fleet stays safe.
AI_MEMORY_FED_* knob.Federation identity is configured entirely through the AI_MEMORY_FED_* env-var family:
| Env var | Effect |
|---|---|
AI_MEMORY_FED_IDENTITY | Overrides the node's federation identity. Highest precedence; blank is ignored. Default: host:<hostname>. |
AI_MEMORY_FED_TRUST_DOMAIN | The trust domain a receiver's bundle is scoped to. Cross-domain creds are rejected. |
AI_MEMORY_FED_TRUST_BUNDLE_DIR | Directory of trusted issuer verifying keys — the O(1) enrollment surface. |
AI_MEMORY_FED_CRED_PATH | Path to this node's issued leaf credential (what it presents). |
AI_MEMORY_FED_CRED_CHAIN_PATH | Path to the anchor-first intermediate chain (hierarchical trust). |
AI_MEMORY_FED_INVENTORY_PATH | Path to the declarative GitOps inventory YAML. |
AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT | Fail-closed gate. When =1, a receiver rejects any peer with no valid CA-signed credential — 401 peer_not_enrolled. The switch that makes enrollment mandatory. |
AI_MEMORY_KEY_DIR | Directory holding this node's signing keypair at <key_dir>/<federation-identity>.{pub,priv}, loaded by the resolved (slashed) identity. |
AI_MEMORY_FED_REQUIRE_SIG | Receivers reject unsigned posts (default-on; inherited enforcement gate). |
Compiled defaults (SSOT constants): leaf TTL 1h, intermediate TTL 1d, clock skew 30s, max chain depth 2, renewal interval 60s, renewal lead 15m.
A repo-reviewed YAML file is the source of truth for fleet membership, trust topology, and enforcement. Parsed strictly (deny_unknown_fields) so a typo is a load-time error, never a silently-weakened posture. Point the daemon at it with AI_MEMORY_FED_INVENTORY_PATH.
trust_domain: fleet.example
root_ca: root/ca
regions:
- name: nyc
intermediate_ca: region/nyc/ca # optional — omit for single-tier
nodes:
- id: region/nyc/node-1 # SPIFFE-style
attestor: mtls-cert # mtls-cert | node-plugin
cred_ttl: 1h
renew_before: 15m # must be < cred_ttl
roles: [writer]
quorum:
width: 2 # W-of-N; >= 1
enforcement:
require_sig: true # maps to AI_MEMORY_FED_REQUIRE_SIG
child.issuer_id == parent.subject_agent_id) and domain binding.The reconciler is a pure diff of desired inventory vs. observed state → a partition-safe ReconcilePlan (strict-enforcement actions are emitted last, gated on observed sign-capability). The side-effecting Apply half — scripts/federation-rollout.sh — does health-gated, mTLS-verified binary swaps with automatic rollback; if both new and previous binaries fail health it emits a loud MANUAL INTERVENTION block rather than leaving the fleet dark.
Four Prometheus SLO series surface the trust path's health, refreshed every renewal tick:
| Metric | SLO |
|---|---|
ai_memory_federation_cred_verify_total{result} | verify-failure-rate = fail / (ok + fail) |
ai_memory_federation_inbound_cred_total{presence} | signed-vs-unsigned ratio (climbs to 1.0 as peers upgrade) |
ai_memory_federation_cred_max_age_seconds | max-cred-age — alert as it nears the leaf TTL |
ai_memory_federation_renewal_lag_seconds | renewal-lag — alert when it exceeds the refresh interval |
Every issue / renew / revoke is also recorded on the signed_events audit chain.
The daemon and the entire federation-identity core are pure Rust with no platform-bound logic in the trust path. The credential format, issuer, trust bundle, chain verification, inventory parsing, renewal worker, and reconciler behave identically on Linux, Windows, and macOS — the CI matrix runs the full suite on all three on every change. A credential minted on a Linux root CA verifies on a Windows node and a macOS node with byte-identical results. A federated fleet can be heterogeneous across operating systems in one trust domain.
Support focus is tiered by where production enterprise fleets actually run — a priority ranking, not a capability limit:
| Priority | Platform | Default shell | Notes |
|---|---|---|---|
| Primary | Linux (x86_64 / aarch64) | Bash | Reference enterprise target — systemd rollout, Unix key-mode enforcement, container / Kubernetes substrate. |
| Primary | Windows (x86_64) | PowerShell | First-class daemon + federation node. Key-directory hardening via NTFS ACLs (POSIX mode bits don't apply). |
| Tertiary | macOS (Apple Silicon / x86_64) | Zsh | Startup / small-enterprise niche — e.g. Mac Mini clusters. Fully functional node; Unix mode enforcement as on Linux. |
The ai-memory binary is shell-agnostic; only the env-var-setting syntax differs per platform's default shell. The same AI_MEMORY_FED_* knob, set three ways:
# Linux (Bash) / macOS (Zsh)
export AI_MEMORY_FED_TRUST_DOMAIN="fleet.example"
export AI_MEMORY_FED_IDENTITY="region/nyc/node-1"
# Windows (PowerShell)
$env:AI_MEMORY_FED_TRUST_DOMAIN = "fleet.example"
$env:AI_MEMORY_FED_IDENTITY = "region/nyc/node-1"
The only OS-specific differences are operational plumbing — the service supervisor (systemd / launchd / Windows Service or WSL2) and the key-directory permission mechanism (POSIX mode bits on Unix, NTFS ACLs on Windows). The trust model itself is identical everywhere.
All CA / node / credential material is minted by examples/fed_issue.rs — a standalone first-party cargo example that composes the Ed25519 + canonical-CBOR primitives already in ai-memory. It is never linked into the golden binary: there is no Cargo.toml change and the pinned sha256 is unchanged. Build + run it on demand — the first call compiles, the rest are cache hits. Every verb is idempotent, so a re-run yields a stable trust topology.
| Verb | What it does |
|---|---|
mint-ca | Generate the campaign CA keypair (--key-dir, --issuer-id). |
export-bundle | Publish only the CA verifying key to the trust bundle dir — the sole anchor a receiver needs. |
gen-node | Generate a node signing keypair (--agent-id may be a slashed federation identity). |
issue | Mint a short-lived CA-signed credential binding the node's identity → its verifying key, scoped to the trust domain (--ttl-secs default 3600). |
One footgun:issuer-idMUST be slash-free.TrustBundle::from_diris non-recursive, so a slashed id would nest the published<issuer-id>.pubin a skipped sub-dir → empty bundle →UnknownIssuer. The tool'svalidate_issuer_idrejects it at the boundary; nodesubject-agent-ids, by contrast, may be slashed.
The push-based deploy/hive-1461/provision/45_zero_touch.sh drives these four verbs per peer: mint the CA once, then per peer generate a keypair, issue a credential, fan out keys/ + trust/ + credential to /etc/ai-memory (private key 0400, public 0444), and append the five daemon env vars. Agents and ctrl are pure mTLS API clients, not mesh members, and are skipped. Step-by-step: docs/zero-touch-quickstart.md.
The hive-1461 reproducible baseline runs this arm as provisioning step 45 and the full-spectrum P3 suite exercises it over the real TLS+mTLS path against the live mesh. The committed canonical report is TOTAL=26 PASS=26 FAIL=0, including four dedicated zerotouch checks:
| Check | Proves |
|---|---|
enrolled_write | An enrolled peer writes a collective memory using only its CA-signed credential. |
enrolled_converge | That memory converges on a federated peer via the CA credential — no operator-pushed pubkey. |
unenrolled_status | A peer with valid api-key + mTLS but no enrollment is refused 401 on /sync/since. |
unenrolled_reason | The refusal is peer_not_enrolled — the AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT=1 fail-closed gate. |
Both the environment and the results are peer-reproducible: make seed up provision validate test stands the fleet up and re-proves all 26 checks. The report ships at deploy/hive-1461/results/test-full-spectrum.{json,tsv}.
fed_issue verbs, on-disk layout, and the push-based enrollment flow.