Architecture — ai-memory A2A gate¶
How every piece fits together. For operators running campaigns, engineers extending the harness, and auditors verifying the evidence chain.
1. What this repository is¶
A reproducible test harness that provisions a 4-node DigitalOcean VPC, installs two heterogeneous AI agent frameworks (OpenClaw + Hermes, one per campaign), wires both to the same ai-memory federation mesh, and runs a suite of scenarios proving (or disproving) that agents can coordinate through shared memory as their only communication channel.
The output is committed evidence. No closed-box claims.
2. System layers¶
┌─────────────────────────────────────────────────────────────┐
│ GitHub (source of truth) │
│ ┌──────────────────┐ ┌────────────────────────────────┐ │
│ │ main branch │ │ repo secrets (encrypted) │ │
│ │ - scripts/ │ │ - DIGITALOCEAN_TOKEN │ │
│ │ - terraform/ │ │ - DIGITALOCEAN_SSH_PRIVATE_KEY │ │
│ │ - docs/ │ │ - DIGITALOCEAN_SSH_KEY_FP │ │
│ │ - .github/ │ │ - XAI_API_KEY │ │
│ └──────────────────┘ └────────────────────────────────┘ │
│ │ │ │
│ │ workflow_dispatch │ consumed by workflow │
│ ▼ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ .github/workflows/a2a-gate.yml (the gate) │ │
│ │ ubuntu-24.04 runner │ │
│ └───────────────────────────────────────────────────────┘ │
└──────────────────────────────┬──────────────────────────────┘
│ Terraform + ssh
▼
┌─────────────────────────────────────────────────────────────┐
│ DigitalOcean (ephemeral) │
│ ┌──────────────────┐ ┌────────────────────────────────┐ │
│ │ VPC 10.251/24 │ │ DO Cloud Firewall │ │
│ │ (openclaw) │ │ - 22 in (runner only) │ │
│ │ or 10.252/24 │ │ - 9077 in (same VPC only) │ │
│ │ (hermes) │ │ - * out (xAI, GitHub, etc) │ │
│ └────────┬─────────┘ └────────────────────────────────┘ │
│ │ │
│ ┌─────┼─────┬─────────┬─────────────┐ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌─────┐┌─────┐┌─────┐┌─────────────────────────┐ │
│ │ node│ node│ node│ node-4 │ │
│ │ -1 │ -2 │ -3 │ memory-only │ │
│ │alice│ bob │charl│ (aggregator, no agent) │ │
│ └──┬──┘└──┬──┘└──┬──┘└───────────┬─────────────┘ │
│ │ │ │ │ │
│ └─────┴─────┴───────────────┘ │
│ federation W=2/N=4 over port 9077 │
└─────────────────────────────────────────────────────────────┘
Each droplet is Ubuntu 24.04 LTS x64 (ubuntu-24-04-x64) running:
ai-memory serve --quorum-writes 2 --quorum-peers <other-3>— the long-running federation daemon on0.0.0.0:9077ai-memory mcp— short-lived stdio child, spawned per MCP tool call by the agent CLI- Agent framework (
openclaworhermes) — takes prompts, invokes MCP tools shutdown -P +480— dead-man switch capping droplet life at 8h
3. Communication topology¶
The only A2A path permitted by the baseline:
Agent A on node-1 Agent B on node-2
│ ▲
│ 1. agent reasons (xAI Grok) │ 5. agent reads (MCP recall)
│ 2. selects memory_store tool │
▼ │
`ai-memory mcp` stdio child │
│ │
│ 3. writes to local serve HTTP │ 4. local serve returns row
│ │
▼ │
Local `ai-memory serve` on node-1 Local `ai-memory serve` on node-2
│ ▲
│ federation W=2 fanout │
└───────► other 3 peers ◄────────┘
(nodes 2, 3, 4)
Every other agent-to-agent communication channel that OpenClaw or Hermes natively supports is explicitly disabled in the per-droplet config (see baseline.md §6b.2 / §6b.3):
- OpenClaw:
agentToAgent: false, no sessions_send/spawn tools, no Telegram/Discord/Slack/Moltbook channels, no gateway remote mode, no node hosts, no subAgent, no agentTeams, no shared Postgres/Redis/Supabase - Hermes:
acp.enabled: false,messaging.gateway_enabled: false, no execution backends,mcp_server_mode: false, no subagent_delegation
This lock-down makes the A2A gate's thesis — "agents can coordinate through shared memory alone" — falsifiable: if a scenario passes, it passed via ai-memory, not via a back-channel.
4. The gate sequence¶
.github/workflows/a2a-gate.yml runs these steps. Each gate is sequential — the next step only executes on green.
| Step | Action | Gate | Exit on fail |
|---|---|---|---|
| 1 | actions/checkout@v5 |
Source retrievable | workflow abort |
| 2 | hashicorp/setup-terraform@v3 |
Terraform CLI available | workflow abort |
| 3 | terraform init + apply |
4 droplets + VPC + firewall created | abort (destroy runs at cleanup) |
| 4 | Wait for SSH (30 × 5s retries per node) | All 4 droplets SSH-reachable | abort |
| 5 | Provision all 4 nodes (setup_node.sh via ssh) |
UFW disabled + ai-memory serve live + agent installed + C1–C10 + F1 + F2a + F2b attested + baseline.json emitted; exit 2/3 on baseline fail | abort |
| 6 | Collect + enforce BASELINE | scp baseline.json from 3 agent nodes; all must show baseline_pass: true |
baseline_ok = false → next step skipped |
| 7 | Functional probe F3 (peer A2A canary) | Write on node-1, W=2 settle, verify on nodes 2, 3, 4 | f3_ok = false → scenarios skipped |
| 8 | Run scenarios (if: baseline_ok && f3_ok) |
Each scenario script emits scenario-N.json (stdout) + scenario-N.log (stderr) |
script exit code recorded; doesn't halt others |
| 9 | Emit campaign.meta.json (always) |
DO region + node roster + actor + workflow URL | soft error |
| 10 | Aggregate campaign summary (always) | collect_reports.sh → a2a-summary.json |
soft error |
| 11 | Regenerate evidence HTML (always) | generate_run_html.sh → index.html |
soft error |
| 12 | Redact secrets (always) | sed pass + grep verify; exit 4 if any known secret leaks | workflow fail |
| 13 | Commit campaign artifacts (always) | git add + commit + push with 5-retry rebase loop | soft error |
| 14 | Tear down (if: always()) |
terraform destroy -auto-approve |
soft error (dead-man switch is backstop) |
Every long-running subprocess inside setup_node.sh has a timeout (since commit 6face55 after r11 hang):
| Subprocess | Timeout |
|---|---|
openclaw install.sh |
600s |
hermes install.sh |
600s |
npm install -g openclaw fallback |
300s |
pip install python-dotenv |
180s |
| F2b agent CLI canary | 60s |
| F1 xAI curl | 20s |
| Base apt-get install | no explicit cap (rarely hangs) |
5. Evidence chain¶
Every run produces a deterministic evidence bundle under runs/<campaign-id>/:
runs/a2a-openclaw-v0.6.0-rN/
├── a2a-baseline.json # per-node attestation union
├── baselines/
│ ├── node-1.baseline.json # raw self-attestation from alice's droplet
│ ├── node-2.baseline.json # from bob's droplet
│ └── node-3.baseline.json # from charlie's droplet
├── f3-peer-a2a.json # cross-node A2A probe verdict
├── campaign.meta.json # DO region, node IPs (public + private), actor, workflow URL, harness SHA
├── a2a-summary.json # scenario rollup, overall_pass, reasons
├── scenario-1.json # per-scenario JSON verdict
├── scenario-1.log # per-scenario stderr trace
├── scenario-1b.json
├── scenario-1b.log
├── scenario-2.json
├── scenario-2.log
├── ... (per scenario in the dispatch)
└── index.html # human-readable dashboard page
The pages workflow (.github/workflows/pages.yml) then:
1. Regenerates every runs/*/index.html (so generator upgrades propagate)
2. Builds the runs landing table with Baseline / F3 / Scenarios / Group / ref / Spec / Actor columns
3. Mirrors runs/ → site/evidence/
4. Builds the mkdocs-material site
5. Deploys to GitHub Pages
Both the raw JSON artifacts and the rendered HTML land at https://alphaonedev.github.io/ai-memory-ai2ai-gate/evidence/<campaign-id>/. The runs landing is at https://alphaonedev.github.io/ai-memory-ai2ai-gate/runs/.
6. Security posture¶
- Secrets: live only in GitHub encrypted secrets + the ephemeral droplets that
terraform destroyerases. Redaction sed pass + post-redaction grep verify before commit;exit 4if any known secret value is found in any file about to be pushed. - Droplet IPs: committed to
campaign.meta.jsonintentionally (audit linkage; ephemeral). - No operator PII: the harness captures only the GitHub actor handle (public) and the workflow run URL.
- Firewall: DO Cloud Firewall enforces inbound; Ubuntu UFW explicitly disabled on every node to prevent interference with federation loopback traffic.
- Dead-man switch: every droplet schedules
shutdown -P +480immediately after SSH is reachable. 8h worst-case droplet lifetime regardless of CI state.
See baseline.md §9b for the full secret-handling contract.
7. Who authors what — the AI NHI collaboration model¶
This harness is built by multiple AI NHIs (non-human identities) collaborating through git commits + committed evidence. No live channels.
| Role | Who | How |
|---|---|---|
| Harness authoring | Claude Opus 4.7 (this instance) | Writes scripts + docs + workflow + analysis. Commits to main. Never runs on a droplet. |
| Agent reasoning on droplets | xAI Grok (grok-4-fast-non-reasoning) |
Invoked by openclaw run or hermes chat. Selects MCP tools. Reasoning NEVER read by Claude directly — only via committed scenario logs. |
| Substrate under test | ai-memory-mcp (v0.6.0) | The product being validated. Receives MCP tool calls from agents. |
| Narrative analysis | Claude Opus 4.7 | Reads committed evidence + writes tri-audience narratives into analysis/run-insights.json (engineering / product / executive framing). |
Claude observes the agent AIs only post-hoc via committed files. The A2A gate is therefore a test of AI agents that AI Claude has authored — a legitimate AI-NHI-builds-tests-for-AI-NHIs operation.
8. What the gate does NOT claim¶
- Not a chaos engineering harness — injects no faults beyond the partition-tolerance scenario (S14, deferred)
- Not a load-test harness — S4 exercises burst writes but ~100 rows/burst, not thousands
- Not a security penetration test — docs/security.md covers the security posture contract, not red-team coverage
- Not cross-framework A2A by default — current campaigns are homogeneous (all openclaw or all hermes); mixed campaigns are a planned future scenario
- Not a replacement for the ship-gate — which validates ai-memory as a product (CRUD, federation, migration, chaos). The A2A gate sits ON TOP of a green ship-gate to validate agent integration.
9. Read next¶
- Baseline configuration — the invariants gate
- Test book — the scenario catalog
- Reproducing — operator playbook
- Runbook — day-to-day operations + diagnostics
- Incidents — what we learned from each iteration
- Methodology — why these invariants
- Security — threat model + secret handling