Campaign a2a-ironclaw-v3r14-mtls-develop FAIL

Agent group: ironclaw (homogeneous)
ai-memory ref: develop
Completed at: 2026-04-22T18:39:05Z
Overall pass: false
Skipped reports: 0

Infrastructure

Provider: digitalocean
Region: nyc3
Droplet size: s-2vcpu-4gb
Topology: 4-node federation mesh (W=2/N=4)
Scenarios started
Scenarios ended
Dispatched by: alphaonedev
Harness SHA: 472a414c9315
Workflow run: https://github.com/alphaonedev/ai-memory-ai2ai-gate/actions/runs/24795170219

Node roster

#	Role	Agent ID	Public IP	Private IP
1	agent	`ai:alice`	`159.203.161.193`	`10.10.2.5`
2	agent	`ai:bob`	`45.55.136.89`	`10.10.2.2`
3	agent	`ai:charlie`	`104.131.18.124`	`10.10.2.3`
4	memory-only	`—`	`104.236.222.25`	`10.10.2.4`

Baseline attestation BASELINE VIOLATION

Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:

Spec version: 1.0.0 — see authoritative baseline.

Node	Agent	Framework	Authentic	MCP ai-memory	xAI cfg	xAI default	Agent ID	Federation	UFW off	iptables	dead-man	F1 xAI	F2a substrate	F2b agent (non-gating)	Config SHA	Pass

a2a-baseline.json

{
	"baseline_pass": false,
	"per_node": [],
	"failure_mode": "baseline-absent"
}

raw file

Run focus

Campaign failed with no scenario reports recovered.

What this campaign tested: The campaign requested 35 scenarios covering AI memory sharing across federation mesh with mTLS, but no reports were recovered, resulting in zero coverage of transport, framework, or primitives axes.

What it demonstrated: The run proved a harness failure in recovering scenario reports, demonstrating nothing about AI memory reliability or functionality.

AI NHI analysis · Claude Opus 4.7

Campaign failed with no scenario reports recovered.

FAIL — no reports recovered from 35 requested scenarios.

For three audiences

Non-technical end users

This test run didn't collect any data because the reports went missing. We can't tell if the AI agents are sharing memories reliably or not. It's like planning a bunch of experiments but losing all the notes.

C-level decision makers

High risk due to complete campaign failure from report recovery issues, signaling unstable testing infrastructure not ready for production. Customer claims on AI memory sharing remain unvalidated. No progress or changes detectable versus prior runs.

Engineers & architects

Failure mode was total absence of scenario reports, as indicated in summary reasons, with all 35 requested scenarios (S1, S1b, S2, S4-S6, S9-S18, S22-S25, S28-S42) effectively skipped. Probable root cause is a bug in the CI harness's report aggregation or infra teardown phase, possibly related to harness_sha 472a414c9315b4d4455722503b45aa34ed6c2060. No primitives or probes were exercised.

What changes going into the next campaign

Implement logging and error handling in the harness to diagnose and prevent report recovery failures.

All artifacts

Generated by scripts/generate_run_html.sh. Methodology: alphaonedev.github.io/ai-memory-ai2ai-gate/methodology. Analysis source: analysis/run-insights.json.