../ runs index

Campaign a2a-ironclaw-v3r16-mtls-release-v0.6.2 FAIL

Agent group
ironclaw (homogeneous)
ai-memory ref
release/v0.6.2
Completed at
2026-04-22T21:23:28Z
Overall pass
false
Skipped reports
0

Infrastructure

Provider
digitalocean
Region
nyc3
Droplet size
s-2vcpu-4gb
Topology
4-node federation mesh (W=2/N=4)
Scenarios started
Scenarios ended
Dispatched by
alphaonedev
Harness SHA
4bf59f540624
Workflow run
https://github.com/alphaonedev/ai-memory-ai2ai-gate/actions/runs/24802395440

Node roster

#RoleAgent IDPublic IPPrivate IP
1agentai:alice104.131.179.14810.10.2.3
2agentai:bob45.55.82.11410.10.2.5
3agentai:charlie104.131.1.15010.10.2.4
4memory-only159.203.67.11910.10.2.2

Baseline attestation BASELINE VIOLATION

Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:

Spec version: 1.0.0 — see authoritative baseline.

NodeAgentFrameworkAuthenticMCP ai-memoryxAI cfgxAI defaultAgent IDFederationUFW offiptablesdead-manF1 xAIF2a substrateF2b agent (non-gating)Config SHAPass
a2a-baseline.json
{
	"baseline_pass": false,
	"per_node": [],
	"failure_mode": "baseline-absent"
}

raw file

Run focus

Campaign failed: no scenario reports recovered.

What this campaign tested: No scenarios were exercised, despite requesting 35 across various transport, framework, and primitive axes, due to report recovery failure.

What it demonstrated: The run proved a critical flaw in the testing harness, yielding no demonstrable results on agent memory sharing reliability.

AI NHI analysis · Claude Opus 4.7

Campaign failed: no scenario reports recovered.

FAIL — zero scenarios executed or reported.

For three audiences

Non-technical end users

The test campaign didn't work at all because no results were collected from any of the planned scenarios. This means we have no information on whether AI agents can reliably share memories with each other. It's like setting up a big experiment but forgetting to record the outcomes.

C-level decision makers

This run exposes high operational risk from harness failures, blocking any assessment of production readiness for AI memory sharing. Customer-facing claims remain unvalidated, with no progress versus prior runs. Prioritize debugging the CI workflow to restore testing integrity.

Engineers & architects

The primary failure mode was complete absence of scenario reports, resulting in all 35 requested scenarios (e.g., S1, S1b, S2, etc.) being effectively skipped. No primitives were impacted or tested due to probable root cause in the harness (SHA: 4bf59f5406248cf8fd87fbccf96f0f537d850a7c) failing to recover outputs. No specific testbook or probe identifiers are available as no per-scenario data was generated. Infrastructure setup (4-node federation mesh) appeared nominal but unutilized.

What changes going into the next campaign

Debug and fix the CI harness to ensure scenario reports are properly generated and recovered before the next campaign.

All artifacts