Campaign a2a-ironclaw-v3r16-mtls-release-v0.6.2 FAIL

Agent group: ironclaw (homogeneous)
ai-memory ref: release/v0.6.2
Completed at: 2026-04-22T21:23:28Z
Overall pass: false
Skipped reports: 0

Infrastructure

Provider: digitalocean
Region: nyc3
Droplet size: s-2vcpu-4gb
Topology: 4-node federation mesh (W=2/N=4)
Scenarios started
Scenarios ended
Dispatched by: alphaonedev
Harness SHA: 4bf59f540624
Workflow run: https://github.com/alphaonedev/ai-memory-ai2ai-gate/actions/runs/24802395440

Node roster

#	Role	Agent ID	Public IP	Private IP
1	agent	`ai:alice`	`104.131.179.148`	`10.10.2.3`
2	agent	`ai:bob`	`45.55.82.114`	`10.10.2.5`
3	agent	`ai:charlie`	`104.131.1.150`	`10.10.2.4`
4	memory-only	`—`	`159.203.67.119`	`10.10.2.2`

Baseline attestation BASELINE VIOLATION

Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:

Spec version: 1.0.0 — see authoritative baseline.

Node	Agent	Framework	Authentic	MCP ai-memory	xAI cfg	xAI default	Agent ID	Federation	UFW off	iptables	dead-man	F1 xAI	F2a substrate	F2b agent (non-gating)	Config SHA	Pass

a2a-baseline.json

{
	"baseline_pass": false,
	"per_node": [],
	"failure_mode": "baseline-absent"
}

raw file

Run focus

Campaign failed: no scenario reports recovered.

What this campaign tested: No scenarios were exercised, despite requesting 35 across various transport, framework, and primitive axes, due to report recovery failure.

What it demonstrated: The run proved a critical flaw in the testing harness, yielding no demonstrable results on agent memory sharing reliability.

AI NHI analysis · Claude Opus 4.7

Campaign failed: no scenario reports recovered.

FAIL — zero scenarios executed or reported.

For three audiences

Non-technical end users

The test campaign didn't work at all because no results were collected from any of the planned scenarios. This means we have no information on whether AI agents can reliably share memories with each other. It's like setting up a big experiment but forgetting to record the outcomes.

C-level decision makers

This run exposes high operational risk from harness failures, blocking any assessment of production readiness for AI memory sharing. Customer-facing claims remain unvalidated, with no progress versus prior runs. Prioritize debugging the CI workflow to restore testing integrity.

Engineers & architects

The primary failure mode was complete absence of scenario reports, resulting in all 35 requested scenarios (e.g., S1, S1b, S2, etc.) being effectively skipped. No primitives were impacted or tested due to probable root cause in the harness (SHA: 4bf59f5406248cf8fd87fbccf96f0f537d850a7c) failing to recover outputs. No specific testbook or probe identifiers are available as no per-scenario data was generated. Infrastructure setup (4-node federation mesh) appeared nominal but unutilized.

What changes going into the next campaign

Debug and fix the CI harness to ensure scenario reports are properly generated and recovered before the next campaign.

All artifacts

Generated by scripts/generate_run_html.sh. Methodology: alphaonedev.github.io/ai-memory-ai2ai-gate/methodology. Analysis source: analysis/run-insights.json.