ai-memory A2A gate¶
v0.6.3 — 48/48 ironclaw-mtls green — 28m wall — 2026-04-27
Campaign 25021409589 returned overall_pass: true on 2026-04-27.
Validated against ai-memory commit 2cfcc18 (release/v0.6.3).
| Cell | Verdict | Scenarios | Wall | Notes |
|---|---|---|---|---|
ironclaw-mtls |
✅ | 48 / 48 | 28m | All 48 scenarios passed. Closed S18 (semantic expansion) and S39 (SSH STOP/CONT reliability) which were the residual blockers at v3r23/v0.6.2. |
Scenarios closed in this release:
- S18 (semantic expansion) — was open at v0.6.2 / v3r23; now resolved in v0.6.3.
- S39 (SSH STOP/CONT reliability) — was open at v0.6.2 / v3r23; now resolved in v0.6.3.
→ v0.6.3 evidence directory · → v0.6.3 offline HTML · → test-hub release page · → release notes
Distribution channels: crates.io, Docker GHCR, Homebrew, Fedora COPR.
Reproducible AI-to-AI integration testing for ai-memory-mcp. Where ai-memory-ship-gate validates the memory system itself, this repository validates what happens when real AI agents use ai-memory to communicate with each other — IronClaw (Rust), Hermes (Python), and OpenClaw (Python) agents running on separate DigitalOcean droplets (or, for the openclaw cell, the local Docker mesh that bypasses the DO General Purpose tier bump), sharing context through a central ai-memory authoritative store.
- Baseline configuration · the hard-gated standard every agent droplet must satisfy before any scenario runs — authentic frameworks, xAI Grok, ai-memory MCP, UFW off, functional probes
- Methodology · every invariant this campaign defends
- Topology · 4-node VPC architecture
- Agents · IronClaw (Rust), Hermes (Python), and OpenClaw (Python) integration details
- Local Docker mesh · Reproducible 4-node OpenClaw harness on a single workstation (no DO required)
- Scenarios · 8 test groups covering the full memory surface
- Campaign runs · live evidence dashboard
- v1.0 GA criteria · the forward-looking contract every 0.6.x/0.7.x/0.8.x release steps toward
- Reproducing · run it yourself on your own DO account
- Security · TLS, mTLS, dead-man switch, key custody
Certification threshold¶
A2A-gate certification requires three consecutive overall_pass = true runs at full scenario coverage (up to 36 scenarios under the testbook v3.0.0 × baseline v1.4.0 set — 36 at mtls, 35 at tls, 34 at off). Any single overall_pass = false resets the counter; there is no credit for partial green.
Current best (as of 2026-04-27): 48/48 on ironclaw-mtls at v0.6.3 — closing S18 (semantic expansion) and S39 (SSH STOP/CONT reliability), the residual blockers at v3r23 / v0.6.2. Cert-run head commit: release/v0.6.3 @ 2cfcc18. Prior best was 37/37 on mtls / 35/35 on tls / 35/35 on off at v0.6.2 across ironclaw (DO), hermes (DO), and openclaw (local-docker).
Consecutive green streak: 3 / 3 → v0.6.3 CERTIFIED (2026-04-27) on ironclaw-mtls (campaign run #25021409589). Hermes and openclaw cells continue to publish per-release runs under runs/; the v0.6.3 banner above tracks the headline ironclaw-mtls cell.
Testing is continuous; certification is forward-looking toward v1.0 GA. Every campaign run is published under runs/ regardless of outcome — a red run is data, not a setback. See v1.0 GA criteria for what has to be true across ai-memory-mcp, ship-gate, and this repo for the 1.0 tag to cut.
This replaces earlier release-notes language on v0.6.0 and v0.6.1. Those releases were validated against the A2A-gate (per-release, against live infrastructure) — not certified by it. v0.6.2 was the first release to land three consecutive green runs at 36/36 on the headline mtls cells. v0.6.3 extends that with 48/48 on ironclaw-mtls — the new baseline (35) plus 4 auto-append plus 9 new scenarios introduced for v0.6.3 (capabilities v2, KG, entity, lifecycle).
The 60-second pitch¶
ai-memory on its own is a persistent memory store. Its value lands only when agents actually use it to maintain context, hand off tasks, and share knowledge. The ship-gate campaign proves the substrate works under load, under chaos, under migration. The A2A gate proves that two heterogeneous AI agent frameworks — IronClaw (Rust) and Hermes (Python) — can use that substrate to talk to each other without private channels, without dedicated orchestration layers, without any shared code except the ai-memory MCP interface.
Every scenario in this campaign is either a concrete inter-agent use case or a safety invariant that protects those use cases. A green A2A gate run is evidence that the shared-memory story is not a slide deck — it runs every day on real droplets under real load.
What this means to you¶
Why should you trust that your AI agents can actually talk to each other through ai-memory?
Because on every release, three real AI agents — two IronClaw, one Hermes (or vice versa on the cross-framework campaign) — spin up on fresh cloud servers, write memories, read each other's memories, hand off tasks, detect contradictions, and propagate context exactly the way a real deployment would. Every handoff is measured. Every recall is checked. Every disagreement is surfaced to a third agent as evidence that the system notices when agents disagree.
If a release breaks the ability of Agent A to see what Agent B just wrote, we find out in fifteen minutes and block the tag. If a release breaks contradiction detection or scoping visibility, same. You never get the breakage.
Every campaign run is published as evidence. Every JSON artifact is in this repository and browsable from the runs dashboard. No closed-box attestations.
What business risk does the A2A gate buy down?
- Integration risk. Customers running multi-agent systems are the most demanding users of ai-memory. They need predictable, reproducible, safe agent-to-agent memory semantics. This campaign catches regressions in that surface before release.
- Vendor-lock-in objection, answered. We test two different AI agent stacks (OpenClaw, Hermes) on the same ai-memory store — evidence that our memory substrate is framework-agnostic.
- Audit posture. Every A2A test produces immutable JSON artifacts. A compliance reviewer asking "how do you know agents can't leak memories across scope boundaries?" gets a test artifact from this morning's campaign, not a narrative.
- Velocity. A full A2A campaign runs in approximately 20 minutes at ~$0.20 of DigitalOcean compute — a fourth droplet bumps spend slightly above the ship-gate's $0.10 baseline. Release signal stays under half an hour from dispatch.
- Release-gate stack. Ship-gate green + A2A gate green is the combined pre-release signal. Shipping with either red carries risk; shipping with both green carries evidence.
What invariants does the A2A gate defend?
| Invariant | Scenario | Pass criterion |
|---|---|---|
| Every agent's writes reach every agent's recall | 1 | recall on node-N returns memories written by node-M, exact payload equivalence |
agent_id metadata is immutable across the round-trip |
1, 5 | metadata.agent_id of recalled row equals writer's id; also preserved through consolidate |
| Shared-context handoff is synchronous enough for a request-response agent pattern | 2 | Agent B sees Agent A's handoff memory within the quorum-settle bound defined in ship-gate Phase 2 |
memory_share delivers subset sync when invoked |
3 | The specific ids/namespace/last-N set that A invoked lands on C with insert_if_newer semantics respected |
| Quorum writes with W=2 of N=3 survive writer-peer pairing | 4 | All writes ok; settle + convergence identical to ship-gate Phase 2 contract |
memory_consolidate preserves the consolidated-from-agents provenance |
5 | metadata.consolidated_from_agents is the set of authors, not overwritten |
memory_detect_contradiction surfaces to an uninvolved third agent |
6 | Agent C's recall on the topic returns both A and B's memories plus the contradicts link |
| Scope enforcement matrix holds across agents | 7 | Every (scope, caller_scope) pair produces the visibility specified in the Task 1.5 scope contract |
| Auto-tag round-trip (opt-in) | 8 | Agent writes without tags; auto-tag pipeline runs; another agent recalls by generated tag and gets the row |
Each scenario emits a structured JSON report with
{pass: bool, reasons: [...]}. The aggregator produces
a2a-summary.json with overall_pass = all-scenarios-pass.
The workflow fails the build on false.
See Methodology for the full mechanics and Topology for network + auth layout.
Goals of the A2A gate¶
- Prove that the shared-memory A2A story actually works end- to-end on real multi-agent-framework workloads, not just single-process harnesses.
- Frame-agnostic validation. Run two different agent stacks against the same memory; prove the interface is the contract, not the implementation.
- Publish evidence, not claims. Every scenario's artifact
lands in
runs/; every failure narrative lands inanalysis/run-insights.json. - Catch regressions before they ship. A red A2A gate blocks the customer-facing claim, regardless of ship-gate posture.
- Bound cost. 4 droplets × ~20 min wall clock = ~$0.20 per clean run. In-droplet dead-man switch caps worst case at 8 hours.
- Document what the A2A gate does NOT cover. Cross-cloud A2A, human-in-the-loop agent supervision, and adversarial-agent scenarios are out of scope; see Methodology § Out of scope.
Position in the release protocol¶
| Stage | Harness | Validates |
|---|---|---|
| Unit + integration | cargo test in ai-memory-mcp |
per-module correctness |
| Ship-gate Phases 1-4 | ai-memory-ship-gate | single-node, 3-node federation, migration, chaos |
| A2A gate (this repo) | ai-memory-ai2ai-gate | A2A communication through shared memory |
A2A gate dispatches after the ship-gate returns
overall_pass: true. Both green → customer-facing claims supported.
Either red → release blocked until fixed.
Cost per run¶
~$0.20 of DigitalOcean compute for a clean ~20-minute run. 4
droplets (3 × s-2vcpu-4gb for agents + 1 × s-2vcpu-4gb for the
authoritative store). Dead-man switch caps every droplet at 8 hours.
See Security.
Current status¶
Active on release/v0.6.3 (commit 2cfcc18, shipped 2026-04-27). The
ironclaw-mtls cell hit 48/48 green on campaign run #25021409589 —
closing S18 (semantic expansion) and S39 (SSH STOP/CONT reliability)
which were the residual blockers at v3r23 / v0.6.2. Headline banner above
is regenerated from releases/v0.6.3/summary.json.
Matrix (2 frameworks × 3 transport modes, updated per campaign):
| off | tls | mtls | |
|---|---|---|---|
| ironclaw | tracked under runs/ |
tracked under runs/ |
48 / 48 (v0.6.3) — CERT |
| hermes | tracked under runs/ |
tracked under runs/ |
tracked under runs/ |
| mixed | ⏸ topology | ⏸ topology | ⏸ topology |
Every campaign run — green, red, cancelled — is archived under runs/. The live README tracks the latest dispatch and any in-flight campaigns.
Release history¶
Every released vX.Y.Z ships a releases/<version>/summary.json artifact
that this page reads at build time. The highest-semver entry is the headline
banner at the top; the table below lists every published release in
reverse-chronological order.
| Version | Date | Verdict | Pass / Fail | Wall | Evidence |
|---|---|---|---|---|---|
| v0.6.3 | 2026-04-27 | PASS | 48 / 0 | 28m | evidence |
The schema for summary.json lives in
releases/schema.json.
Pushing a v* tag without a matching releases/<tag>/summary.json fails the
release-blocking release-summary-gate
workflow before any artifact is published.