Distributed agents · ai-memory orchestrating ai-memory.

17 agents at peak, each owning one phase or batch. Coordinated through ai-memory itself — the system being tested is the system that holds the test campaign's state. Every phase write becomes a memory; every completion becomes an A2A notification; the orchestrator drains its inbox and updates the evidence page in real time.

17 agents at peak ai-memory as coordination layer eats own dogfood orchestrator reusable v0.7+
The architecture

One orchestrator, fan-out workers, ai-memory as the bus.

Test Orchestrator drains memory_inbox updates test-hub evidence ai-memory daemon (this is the bus) memory_store · memory_notify · memory_subscribe · memory_inbox namespace: release/v0.6.3/<phase> Ship Phase 1 functional · 1 droplet Ship Phase 2 multi-agent · 3 droplets Ship Phase 3 migration · 2 droplets Ship Phase 4 chaos · 3 droplets A2A batches × 4 scenarios 1-48 Smoke × 5 channels brew · cargo · docker · apt · dnf test-hub Pages releases/v0.6.3/index.html auto-updates as gates land memory_subscribe memory_inbox memory_store + memory_notify on completion push run summary Orchestrator Gate workers A2A batch workers Smoke channel agents ai-memory bus (test target)
▸ Eat own dogfood

The system being tested is the system that holds the test campaign's state.

Every test agent's progress is a memory. Every completion is an A2A notification. The orchestrator memory_subscribes to release/v0.6.3/* and memory_inbox drains the queue every 30s. This is not a clever stunt — it's the strongest possible integration test. If hierarchical namespaces work, the campaign succeeds. If A2A messaging works, the orchestrator hears the workers. If governance gates work, the campaign's writes pass through them. The test campaign exercises the whole product surface under real workload at a scale we'd otherwise need to simulate.

The agents

Who does what.

Test Orchestrator1 instance · always on
Provisions infrastructure for all gate + scenario + smoke workers. Subscribes to release/v0.6.3/* via memory_subscribe. Drains memory_inbox every 30s. Aggregates results into the test-hub evidence page. Detects timeouts (worker silent for > expected wall × 2) and surfaces stuck workers.
# pseudo ai-memory memory_subscribe --url https://test-hub.../webhook --events memory_store \ --namespace_filter "release/v0.6.3/*" while running: inbox = ai-memory memory_inbox --as_agent "orchestrator" --unread_only for notification in inbox: update_evidence_page(notification) if all_phases_landed(): publish_final_verdict() break
Ship-gate Phase 1-4 workers4 workers · 1-3 droplets each
Each phase has its own worker. Worker provisions its phase's terraform target, runs the existing phaseN_*.sh script, captures artifacts in the gate repo's runs/ dir, and notifies the orchestrator on completion. Independent — phases don't share state, so a Phase 4 chaos failure doesn't block the Phase 1-3 gates from being marked green.
# Phase worker — Phase 4 example cd ai-memory-ship-gate terraform apply -target=module.phase4_chaos ./scripts/phase4_chaos.sh release/v0.6.3 --abbreviated ./scripts/collect_reports.sh phase4 v0.6.3 ./scripts/generate_run_html.sh # Notify orchestrator ai-memory memory_notify \ --target_agent_id "orchestrator" \ --title "Phase 4 — chaos abbreviated complete" \ --payload '{"verdict":"pass","run_url":"...","duration_min":118}' \ --priority 8
A2A batch workers4 workers · 1 droplet each
A2A scenarios 1-48 split into 4 batches × 12 scenarios. Each batch worker spins one OpenClaw + Hermes pair on one droplet, runs its 12 scenarios sequentially within the batch (stateful), captures per-scenario JSON + stderr + provenance trace. Notifies on completion with per-scenario verdicts.
# Batch worker cd ai-memory-ai2ai-gate terraform apply -target=module.a2a_batch_3 python scripts/a2a_harness.py \ --release v0.6.3 \ --cell ironclaw-mtls \ --scenarios 25-36 \ --output runs/v0.6.3/batch-3/ ai-memory memory_notify \ --target_agent_id "orchestrator" \ --title "A2A batch 3 (25-36) complete" \ --payload '{"passed":12,"failed":0,"run_url":"..."}'
Distribution smoke agents5 workers · GitHub Actions runners
Five channel-validators fire after the release pipeline publishes. Each: install--version assertion → start daemon → assert memory_capabilities returns schema_version=2. Independent — no shared state.
# Homebrew smoke agent — runs on macos-latest brew install alphaonedev/tap/ai-memory ai-memory --version | grep "0.6.3" || exit 1 ai-memory serve & sleep 5 ai-memory capabilities | jq '.schema_version' | grep '"2"' || exit 1 ai-memory memory_notify \ --target_agent_id "orchestrator" \ --title "Homebrew smoke complete" \ --payload '{"channel":"brew","verdict":"pass"}'
Why ai-memory as the bus

Five reasons.

1. Already deployed. One ai-memory daemon already runs on your laptop or a stable infrastructure node. We don't need to spin up RabbitMQ / Kafka / Redis — the bus is ready.
2. Already tested. Every primitive the orchestrator uses (memory_store, memory_notify, memory_subscribe, memory_inbox) is in scenarios 32-33 of the a2a-gate. If they don't work, the test campaign would fail anyway — we want to know that.
3. Strongest integration test. Real workload at scale, exercising every feature: hierarchical namespaces (release/v0.6.3/<phase>), governance gates (campaign writes pass through them), federation (orchestrator on node-1, workers reporting from node-2/3), A2A messaging (notify/inbox), capabilities introspection.
4. Audit trail by default. Every campaign event is a memory. memory_kg_timeline on the campaign-root memory returns the entire run as a chronological event sequence. Better than ad-hoc log files.
5. Reusable for v0.7+. Same orchestrator runs the v0.7 campaign without changes — schema_version=2 ensures forward-compat for the introspection it relies on. The investment amortizes over every future release.
Failure modes + fallbacks

When the bus itself is what's being tested.

If memory_notify breaks: Orchestrator falls back to S3-style polling of the harness's run JSON dump (artifacts already written by ship-gate's collect_reports.sh). Workers still complete their work; orchestrator just sees results late. No loss of fidelity.
If memory_subscribe webhook fails: Orchestrator's memory_inbox polling loop catches everything. The webhook is a fast-path optimization, not a correctness requirement.
If the ai-memory bus daemon dies: Workers continue and write artifacts to their gate repos. Orchestrator polls those repos as fallback. The campaign degrades to "slow but correct" rather than failing.
If a worker silent times out: Orchestrator detects (timer = expected wall × 2). Surfaces the stuck worker in the evidence page. Operator can re-run that single worker without re-running the campaign.
If a phase fails: Other phases continue. The failed phase's verdict surfaces in the evidence page. Operator decides whether to re-tag, fix forward, or accept the failure.
Implementation status

What exists, what to build.

ComponentStatusSource
Ship-gate phase scripts▸ existsai-memory-ship-gate
A2A scenarios 1-42▸ existsai-memory-ai2ai-gate
A2A scenarios 43-48 (v0.6.3-only)▸ to authorai-memory-ai2ai-gate (~45m)
memory_notify / memory_subscribe / memory_inbox▸ exists (v0.6.0+)ai-memory daemon
Terraform per-phase targets▸ existsboth gate repos
Orchestrator script▸ to build (~3h)this test-hub repo
Webhook endpoint (test-hub)▸ to build (~30m)this test-hub repo
Evidence-page auto-update▸ to build (~1h)this test-hub repo
Channel-smoke GitHub Actions workflow▸ to build (~30m)ai-memory-mcp

~5h total scaffolding for the orchestrator + webhook + auto-update + workflow. After that, the orchestrator is reusable for every future release. See the parallel-testing page for the full time math.