Zero-touch trust

CA-rooted, attestation-issued, short-lived credentials replace O(N²) manual key exchange with O(1) "trust the CA". Scale a federated fleet from 1 to ~1,000,000 AI agents — without touching a single peer to add the next node.

O(1) enrollment first-party CA short-lived · auto-rotating hierarchical trust OS-agnostic
Adding a node reconfigures nobody. A node proves who it is via attestation, a CA mints it a short-lived credential, and receivers verify that credential against a small trust bundle holding only the CA's key. No peer-to-peer .pub copy. No restart fan-out. The trust domain grows by trusting the CA — not by introducing every node to every other node.

This is the shape of SPIFFE/SPIRE (CA-rooted, attestation-issued, short-lived, auto-rotating) but first-party: it composes the Ed25519 sign/verify and canonical-CBOR primitives already in ai-memory. No new dependency — no rcgen, no openssl, no X.509.

Full operator + configuration reference: docs/federation-identity.md. Transport hardening beneath it (mTLS, X-API-Key, peer attestation): docs/federation.md.

The problem

O(N²) doesn't scale.

Before v0.7.0 a node's federation identity was its hostname, and trust meant copying every peer's Ed25519 .pub onto every other peer. Three hard limits:

Zero-touch trust replaces all three with one move: trust the CA, not the peers.

The credential

What a node presents.

A FederationCredential is the node's Ed25519 public key bound to its agent-id and a short validity window, signed by a CA key (canonical-CBOR, the same encoder as link signing). It travels base64 next to X-Memory-Sig under x-memory-cred: v1=<base64>.

FieldMeaning
subject_agent_idThe node identity (SPIFFE-style path allowed)
subject_pubkeyThe node's Ed25519 verifying key (32 bytes)
issuer_idThe CA / intermediate that minted it
not_before / not_afterUnix-seconds validity window (short TTL — default 1h)
trust_domainFleet namespace for multi-tenant isolation
cred_versionWire/format version (CRED_VERSION = 1)

Negotiated, no partition: if a credential header is present and a trust bundle is configured, the receiver takes the credential path; otherwise it falls back to the legacy per-peer .pub path. An un-upgraded peer keeps working; an upgraded peer accepts both. Every phase ships independently and a mixed fleet stays safe.

Configuration

Every AI_MEMORY_FED_* knob.

Federation identity is configured entirely through the AI_MEMORY_FED_* env-var family:

Env varEffect
AI_MEMORY_FED_IDENTITYOverrides the node's federation identity. Highest precedence; blank is ignored. Default: host:<hostname>.
AI_MEMORY_FED_TRUST_DOMAINThe trust domain a receiver's bundle is scoped to. Cross-domain creds are rejected.
AI_MEMORY_FED_TRUST_BUNDLE_DIRDirectory of trusted issuer verifying keys — the O(1) enrollment surface.
AI_MEMORY_FED_CRED_PATHPath to this node's issued leaf credential (what it presents).
AI_MEMORY_FED_CRED_CHAIN_PATHPath to the anchor-first intermediate chain (hierarchical trust).
AI_MEMORY_FED_INVENTORY_PATHPath to the declarative GitOps inventory YAML.
AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENTFail-closed gate. When =1, a receiver rejects any peer with no valid CA-signed credential — 401 peer_not_enrolled. The switch that makes enrollment mandatory.
AI_MEMORY_KEY_DIRDirectory holding this node's signing keypair at <key_dir>/<federation-identity>.{pub,priv}, loaded by the resolved (slashed) identity.
AI_MEMORY_FED_REQUIRE_SIGReceivers reject unsigned posts (default-on; inherited enforcement gate).

Compiled defaults (SSOT constants): leaf TTL 1h, intermediate TTL 1d, clock skew 30s, max chain depth 2, renewal interval 60s, renewal lead 15m.

GitOps

Declarative inventory.

A repo-reviewed YAML file is the source of truth for fleet membership, trust topology, and enforcement. Parsed strictly (deny_unknown_fields) so a typo is a load-time error, never a silently-weakened posture. Point the daemon at it with AI_MEMORY_FED_INVENTORY_PATH.

trust_domain: fleet.example
root_ca: root/ca
regions:
  - name: nyc
    intermediate_ca: region/nyc/ca   # optional — omit for single-tier
    nodes:
      - id: region/nyc/node-1        # SPIFFE-style
        attestor: mtls-cert          # mtls-cert | node-plugin
        cred_ttl: 1h
        renew_before: 15m            # must be < cred_ttl
        roles: [writer]
quorum:
  width: 2                           # W-of-N; >= 1
enforcement:
  require_sig: true                  # maps to AI_MEMORY_FED_REQUIRE_SIG
Lifecycle

Mint · rotate · revoke.

Operate

Reconciler, rollout & SLOs.

The reconciler is a pure diff of desired inventory vs. observed state → a partition-safe ReconcilePlan (strict-enforcement actions are emitted last, gated on observed sign-capability). The side-effecting Apply half — scripts/federation-rollout.sh — does health-gated, mTLS-verified binary swaps with automatic rollback; if both new and previous binaries fail health it emits a loud MANUAL INTERVENTION block rather than leaving the fleet dark.

Four Prometheus SLO series surface the trust path's health, refreshed every renewal tick:

MetricSLO
ai_memory_federation_cred_verify_total{result}verify-failure-rate = fail / (ok + fail)
ai_memory_federation_inbound_cred_total{presence}signed-vs-unsigned ratio (climbs to 1.0 as peers upgrade)
ai_memory_federation_cred_max_age_secondsmax-cred-age — alert as it nears the leaf TTL
ai_memory_federation_renewal_lag_secondsrenewal-lag — alert when it exceeds the refresh interval

Every issue / renew / revoke is also recorded on the signed_events audit chain.

Platform

OS-agnostic by design.

The daemon and the entire federation-identity core are pure Rust with no platform-bound logic in the trust path. The credential format, issuer, trust bundle, chain verification, inventory parsing, renewal worker, and reconciler behave identically on Linux, Windows, and macOS — the CI matrix runs the full suite on all three on every change. A credential minted on a Linux root CA verifies on a Windows node and a macOS node with byte-identical results. A federated fleet can be heterogeneous across operating systems in one trust domain.

Support focus is tiered by where production enterprise fleets actually run — a priority ranking, not a capability limit:

PriorityPlatformDefault shellNotes
PrimaryLinux (x86_64 / aarch64)BashReference enterprise target — systemd rollout, Unix key-mode enforcement, container / Kubernetes substrate.
PrimaryWindows (x86_64)PowerShellFirst-class daemon + federation node. Key-directory hardening via NTFS ACLs (POSIX mode bits don't apply).
TertiarymacOS (Apple Silicon / x86_64)ZshStartup / small-enterprise niche — e.g. Mac Mini clusters. Fully functional node; Unix mode enforcement as on Linux.

The ai-memory binary is shell-agnostic; only the env-var-setting syntax differs per platform's default shell. The same AI_MEMORY_FED_* knob, set three ways:

# Linux (Bash) / macOS (Zsh)
export AI_MEMORY_FED_TRUST_DOMAIN="fleet.example"
export AI_MEMORY_FED_IDENTITY="region/nyc/node-1"

# Windows (PowerShell)
$env:AI_MEMORY_FED_TRUST_DOMAIN = "fleet.example"
$env:AI_MEMORY_FED_IDENTITY     = "region/nyc/node-1"

The only OS-specific differences are operational plumbing — the service supervisor (systemd / launchd / Windows Service or WSL2) and the key-directory permission mechanism (POSIX mode bits on Unix, NTFS ACLs on Windows). The trust model itself is identical everywhere.

Reproducible tooling

The first-party issuer.

All CA / node / credential material is minted by examples/fed_issue.rs — a standalone first-party cargo example that composes the Ed25519 + canonical-CBOR primitives already in ai-memory. It is never linked into the golden binary: there is no Cargo.toml change and the pinned sha256 is unchanged. Build + run it on demand — the first call compiles, the rest are cache hits. Every verb is idempotent, so a re-run yields a stable trust topology.

VerbWhat it does
mint-caGenerate the campaign CA keypair (--key-dir, --issuer-id).
export-bundlePublish only the CA verifying key to the trust bundle dir — the sole anchor a receiver needs.
gen-nodeGenerate a node signing keypair (--agent-id may be a slashed federation identity).
issueMint a short-lived CA-signed credential binding the node's identity → its verifying key, scoped to the trust domain (--ttl-secs default 3600).
One footgun: issuer-id MUST be slash-free. TrustBundle::from_dir is non-recursive, so a slashed id would nest the published <issuer-id>.pub in a skipped sub-dir → empty bundle → UnknownIssuer. The tool's validate_issuer_id rejects it at the boundary; node subject-agent-ids, by contrast, may be slashed.

The push-based deploy/hive-1461/provision/45_zero_touch.sh drives these four verbs per peer: mint the CA once, then per peer generate a keypair, issue a credential, fan out keys/ + trust/ + credential to /etc/ai-memory (private key 0400, public 0444), and append the five daemon env vars. Agents and ctrl are pure mTLS API clients, not mesh members, and are skipped. Step-by-step: docs/zero-touch-quickstart.md.

Validated

Proven on a live fleet.

The hive-1461 reproducible baseline runs this arm as provisioning step 45 and the full-spectrum P3 suite exercises it over the real TLS+mTLS path against the live mesh. The committed canonical report is TOTAL=26 PASS=26 FAIL=0, including four dedicated zerotouch checks:

CheckProves
enrolled_writeAn enrolled peer writes a collective memory using only its CA-signed credential.
enrolled_convergeThat memory converges on a federated peer via the CA credential — no operator-pushed pubkey.
unenrolled_statusA peer with valid api-key + mTLS but no enrollment is refused 401 on /sync/since.
unenrolled_reasonThe refusal is peer_not_enrolled — the AI_MEMORY_FED_REQUIRE_PEER_ENROLLMENT=1 fail-closed gate.

Both the environment and the results are peer-reproducible: make seed up provision validate test stands the fleet up and re-proves all 26 checks. The report ships at deploy/hive-1461/results/test-full-spectrum.{json,tsv}.

Next

Where to go.