Campaign a2a-hermes-v0.6.2-patch2-r23-mtls FAIL

Agent group: hermes (homogeneous)
ai-memory ref: release/v0.6.2
Completed at: 2026-04-23T16:48:20Z
Overall pass: false
Skipped reports: 0

Infrastructure

Provider: digitalocean
Region: nyc3
Droplet size: s-2vcpu-4gb
Topology: 4-node federation mesh (W=2/N=4)
Scenarios started
Scenarios ended
Dispatched by: alphaonedev
Harness SHA: 89ac1adba1de
Workflow run: https://github.com/alphaonedev/ai-memory-ai2ai-gate/actions/runs/24845949218

Node roster

#	Role	Agent ID	Public IP	Private IP
1	agent	`ai:alice`	`138.197.21.12`	`10.11.2.4`
2	agent	`ai:bob`	`138.197.126.11`	`10.11.2.3`
3	agent	`ai:charlie`	`104.131.95.255`	`10.11.2.5`
4	memory-only	`—`	`174.138.64.27`	`10.11.2.2`

Baseline attestation BASELINE VIOLATION

Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:

Spec version: 1.4.0 — see authoritative baseline.

Node	Agent	Framework	Authentic	MCP ai-memory	xAI cfg	xAI default	Agent ID	Federation	UFW off	iptables	dead-man	F1 xAI	F2a substrate	F2b agent (non-gating)	Config SHA	Pass
node-2	`ai:bob`	`hermes Hermes Agent v0.10.0 (2026.4.16)`	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	—	`21635cf63640`	FAIL

a2a-baseline.json

{
	"baseline_pass": false,
	"per_node": [
		{
			"spec_version": "1.4.0",
			"agent_type": "hermes",
			"agent_id": "ai:bob",
			"node_index": "2",
			"framework_version": "Hermes Agent v0.10.0 (2026.4.16)",
			"ai_memory_version": "v0.6.2",
			"peer_urls": "https://10.11.2.4:9077,https://10.11.2.5:9077,https://10.11.2.2:9077",
			"config_file_sha256": "21635cf6364057fd2a004d28aac89abf8438671d85f9fd2ed1e654d812d23ff1",
			"config_attestation": {
				"framework_is_authentic": true,
				"mcp_server_ai_memory_registered": true,
				"llm_backend_is_xai_grok": true,
				"llm_is_default_provider": true,
				"mcp_command_is_ai_memory": true,
				"agent_id_stamped": true,
				"federation_live": true,
				"ufw_disabled": true,
				"iptables_flushed": true,
				"dead_man_switch_scheduled": true
			},
			"negative_invariants": {
				"_description": "Alternative A2A channels must be OFF so a passing scenario is only passing via ai-memory shared memory. Any true here = thesis-preserving.",
				"a2a_protocol_off": true,
				"sub_agent_or_sessions_spawn_off": true,
				"alternative_channels_off": true,
				"tool_allowlist_is_memory_only": true,
				"a2a_gate_profile_locked": true
			},
			"functional_probes": {
				"xai_grok_chat_reachable": true,
				"xai_grok_sample_reply": "READY",
				"substrate_http_canary_f2a": true,
				"substrate_http_canary_uuid": "13ab798a-1777-4e7a-8a85-343d22e22c66",
				"agent_mcp_canary_f2b": false,
				"agent_mcp_canary_uuid": "3d744d16-1c2c-48e9-a50e-35d7853cacfc",
				"agent_canary_response_head": "Traceback (most recent call last):   File \"/usr/local/bin/hermes\", line 11, in <module>     main()   File \"/root/.hermes/hermes-agent/hermes_cli/main.py\", line 8859, in main     args.func(args)   File \"/root/.hermes/hermes-agent/hermes_cli/main.py\", line 1159, in cmd_chat     from cli import main as cli_main   File \"/root/.hermes/hermes-agent/cli.py\", line 43, in <module>     from prompt_toolkit.history import FileHistory ModuleNotFoundError: No module named 'prompt_toolkit' ",
				"_f2b_note": "F2b is LLM-dependent and non-blocking. F2a (deterministic HTTP substrate) gates baseline_pass.",
				"mesh_connectivity_f4": false,
				"mesh_edges_ok": 1,
				"mesh_edges_total": 3,
				"mesh_edges_detail": "10.11.2.4:9077:FAIL(health=false,sync=false),10.11.2.5:9077:FAIL(health=false,sync=false),10.11.2.2:9077:OK",
				"_f4_note": "F4 verifies this local nodes N-1 OUTBOUND mesh edges to every peer via both GET health and POST sync_push dry_run. Aggregator ANDs across N nodes to confirm full N*(N-1) bidirectional reachability. Gates baseline_pass.",
				"ai_memory_mcp_stdio_f5": true,
				"ai_memory_mcp_stdio_init_ok": true,
				"ai_memory_mcp_stdio_tools_ok": true,
				"ai_memory_mcp_stdio_tools_found": "memory_agent_list,memory_agent_register,memory_archive_list,memory_archive_purge,memory_archive_restore,memory_archive_stats,memory_auto_tag,memory_capabilities,memory_consolidate,memory_delete,memory_detect_contradiction,memory_expand_query,memory_forget,memory_gc,memory_get,memory_get_links,memory_inbox,memory_link,memory_list,memory_list_subscriptions,memory_namespace_clear_standard,memory_namespace_get_standard,memory_namespace_set_standard,memory_notify,memory_pending_approve,memory_pending_list,memory_pending_reject,memory_promote,memory_recall,memory_search,memory_session_start,memory_stats,memory_store,memory_subscribe,memory_unsubscribe,memory_update",
				"_f5_note": "F5 spawns the ai-memory stdio MCP subprocess using the framework-configured invocation and verifies initialize + tools/list return memory_store, memory_recall, memory_list. Deterministic (no LLM). Gates baseline_pass.",
				"tls_mode": "mtls",
				"tls_handshake_f6": true,
				"tls_handshake_f6_reason": "",
				"mtls_enforcement_f7": true,
				"mtls_enforcement_f7_reason": "",
				"_f6_f7_note": "F6 verifies the TLS 1.3 handshake against the local serve + CA chain. F7 verifies mTLS enforcement — anonymous client rejected, whitelisted client accepted. Both gate baseline_pass when tls_mode != off / mtls respectively.",
				"agent_mcp_ai_memory_canary": true,
				"canary_uuid": "13ab798a-1777-4e7a-8a85-343d22e22c66",
				"canary_namespace": "_baseline_canary_f2a"
			},
			"baseline_pass": false
		}
	],
	"failure_mode": "baseline-violation"
}

raw file

Run focus

Campaign failed: no scenario reports recovered.

What this campaign tested: Attempted to exercise 35 scenarios covering transport (mTLS), framework (federation mesh), and primitives like memory sharing, but no reports were recovered.

What it demonstrated: The run demonstrated a critical failure in the testing harness, as no scenario results were collected or reported.

AI NHI analysis · Claude Opus 4.7

Campaign failed: no scenario reports recovered.

FAIL — no scenario reports recovered

For three audiences

Non-technical end users

This test run didn't work because no results from any of the planned checks were collected. We couldn't determine if agents can reliably share memories with each other. The problem seems to be in how the tests were set up or run.

C-level decision makers

High risk due to complete campaign failure from missing reports, blocking validation of v0.6.2 under mTLS for production. Customer-facing claims on reliable AI memory federation remain unproven. Represents a CI reliability regression compared to prior runs.

Engineers & architects

No per-scenario reports recovered, indicating a harness failure in artifact collection (harness_sha: 89ac1adba1de2a6909ed00d21d62f2c9d11ff051). All 35 requested scenarios (S1, S1b, S2, S4-S6, S9-S18, S22-S25, S28-S42) effectively skipped. Probable root cause is a CI workflow issue in retrieving outputs from the 4-node federation mesh; no specific primitives or failure modes observable.

What changes going into the next campaign

Fix the report recovery and artifact upload logic in the CI workflow before re-running.

All artifacts

Generated by scripts/generate_run_html.sh. Methodology: alphaonedev.github.io/ai-memory-ai2ai-gate/methodology. Analysis source: analysis/run-insights.json.