Campaign a2a-ironclaw-v3r19-mtls-release-v0.6.2 FAIL

Agent group: ironclaw (homogeneous)
ai-memory ref: release/v0.6.2
Completed at: 2026-04-23T00:09:47Z
Overall pass: false
Skipped reports: 0

Infrastructure

Provider: digitalocean
Region: nyc3
Droplet size: s-2vcpu-4gb
Topology: 4-node federation mesh (W=2/N=4)
Scenarios started
Scenarios ended
Dispatched by: alphaonedev
Harness SHA: 28875c272ce8
Workflow run: https://github.com/alphaonedev/ai-memory-ai2ai-gate/actions/runs/24808690840

Node roster

#	Role	Agent ID	Public IP	Private IP
1	agent	`ai:alice`	`138.197.11.207`	`10.10.2.4`
2	agent	`ai:bob`	`104.131.90.230`	`10.10.2.5`
3	agent	`ai:charlie`	`164.90.135.59`	`10.10.2.3`
4	memory-only	`—`	`167.71.182.60`	`10.10.2.2`

Baseline attestation BASELINE VIOLATION

Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:

Spec version: 1.0.0 — see authoritative baseline.

Node	Agent	Framework	Authentic	MCP ai-memory	xAI cfg	xAI default	Agent ID	Federation	UFW off	iptables	dead-man	F1 xAI	F2a substrate	F2b agent (non-gating)	Config SHA	Pass

a2a-baseline.json

{
	"baseline_pass": false,
	"per_node": [],
	"failure_mode": "baseline-absent"
}

raw file

Run focus

Campaign failed: no scenario reports recovered.

What this campaign tested: No scenarios were exercised due to a failure in recovering any reports, despite requesting 35 scenarios covering various transport, framework, and primitive axes.

What it demonstrated: The run proved nothing about agent memory sharing as no results were collected or demonstrated.

AI NHI analysis · Claude Opus 4.7

Campaign failed: no scenario reports recovered.

FAIL — zero scenarios executed or reported.

For three audiences

Non-technical end users

This test run completely failed to produce any results. We have no information on whether agents can reliably share memories with each other. The problem seems to be that the system couldn't collect any data from the tests.

C-level decision makers

This run indicates high risk with zero test coverage, rendering the system not ready for production deployment. No customer-facing claims about reliability can be supported. Compared to prior runs, this represents a regression in test harness stability.

Engineers & architects

All 35 requested scenarios (S1, S1b, S2, S4-S6, S9-S18, S22-S25, S28-S42) failed to produce reports, likely due to a harness issue in report recovery post-execution. No primitives or failure modes were tested; probable root cause is in the CI workflow at harness_sha 28875c272ce86d10b81466cec10ac4da7bf57b74. No specific F# probes triggered as execution didn't proceed to reporting.

What changes going into the next campaign

Debug and resolve the report recovery failure in the test harness before the next campaign.

All artifacts

Generated by scripts/generate_run_html.sh. Methodology: alphaonedev.github.io/ai-memory-ai2ai-gate/methodology. Analysis source: analysis/run-insights.json.