Non-technical end users
The test run didn't work at all, and we got no information from it. We couldn't tell if the AI agents can reliably share memories with each other. The setup needs fixing so future tests can actually check this.
| # | Role | Agent ID | Public IP | Private IP |
|---|---|---|---|---|
| 1 | agent | ai:alice | 159.65.167.29 | 10.10.0.3 |
| 2 | agent | ai:bob | 45.55.78.250 | 10.10.0.2 |
| 3 | agent | ai:charlie | 159.203.131.121 | 10.10.0.5 |
| 4 | memory-only | — | 167.71.247.164 | 10.10.0.4 |
Per the authoritative baseline spec, every agent node must emit a self-attestation before any scenario is permitted to run. This run's attestation:
Spec version: 1.4.0 — see authoritative baseline.
| Node | Agent | Framework | Authentic | MCP ai-memory | xAI cfg | xAI default | Agent ID | Federation | UFW off | iptables | dead-man | F1 xAI | F2a substrate | F2b agent (non-gating) | Config SHA | Pass |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| node-1 | ai:alice | ironclaw ironclaw 0.27.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | — | 317b7f4a1102 | FAIL |
| node-2 | ai:bob | ironclaw ironclaw 0.27.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | d27582d7ab2c | PASS |
| node-3 | ai:charlie | ironclaw ironclaw 0.27.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — | 56edbd6c5c80 | PASS |
{
"baseline_pass": false,
"per_node": [
{
"spec_version": "1.4.0",
"agent_type": "ironclaw",
"agent_id": "ai:alice",
"node_index": "1",
"framework_version": "ironclaw 0.27.0",
"ai_memory_version": "0.6.3.1",
"peer_urls": "http://10.10.0.2:9077,http://10.10.0.5:9077,http://10.10.0.4:9077",
"config_file_sha256": "317b7f4a1102ba7db9fa570534f35695f7e8b2aa715196ca6c682dd4d007d6c9",
"config_attestation": {
"framework_is_authentic": true,
"mcp_server_ai_memory_registered": true,
"llm_backend_is_xai_grok": true,
"llm_is_default_provider": true,
"mcp_command_is_ai_memory": true,
"agent_id_stamped": true,
"federation_live": true,
"ufw_disabled": true,
"iptables_flushed": true,
"dead_man_switch_scheduled": true
},
"negative_invariants": {
"_description": "Alternative A2A channels must be OFF so a passing scenario is only passing via ai-memory shared memory. Any true here = thesis-preserving.",
"a2a_protocol_off": true,
"sub_agent_or_sessions_spawn_off": true,
"alternative_channels_off": true,
"tool_allowlist_is_memory_only": true,
"a2a_gate_profile_locked": true
},
"functional_probes": {
"xai_grok_chat_reachable": false,
"xai_grok_sample_reply": "",
"substrate_http_canary_f2a": true,
"substrate_http_canary_uuid": "5389dd0b-00d6-4abf-b7f6-ec63558eba93",
"agent_mcp_canary_f2b": false,
"agent_mcp_canary_uuid": "f3dd6bd6-fd60-46ee-9f56-461dbf44e4f9",
"agent_canary_response_head": "error: unrecognized subcommand 'chat' tip: a similar subcommand exists: 'channels' Usage: ironclaw [OPTIONS] [COMMAND] For more information, try '--help'. ",
"_f2b_note": "F2b is LLM-dependent and non-blocking. F2a (deterministic HTTP substrate) gates baseline_pass.",
"mesh_connectivity_f4": true,
"mesh_edges_ok": 3,
"mesh_edges_total": 3,
"mesh_edges_detail": "10.10.0.2:9077:OK,10.10.0.5:9077:OK,10.10.0.4:9077:OK",
"_f4_note": "F4 verifies this local nodes N-1 OUTBOUND mesh edges to every peer via both GET health and POST sync_push dry_run. Aggregator ANDs across N nodes to confirm full N*(N-1) bidirectional reachability. Gates baseline_pass.",
"ai_memory_mcp_stdio_f5": true,
"ai_memory_mcp_stdio_init_ok": true,
"ai_memory_mcp_stdio_tools_ok": true,
"ai_memory_mcp_stdio_tools_found": "memory_agent_list,memory_agent_register,memory_archive_list,memory_archive_purge,memory_archive_restore,memory_archive_stats,memory_auto_tag,memory_capabilities,memory_check_duplicate,memory_consolidate,memory_delete,memory_detect_contradiction,memory_entity_get_by_alias,memory_entity_register,memory_expand_query,memory_forget,memory_gc,memory_get,memory_get_links,memory_get_taxonomy,memory_inbox,memory_kg_invalidate,memory_kg_query,memory_kg_timeline,memory_link,memory_list,memory_list_subscriptions,memory_namespace_clear_standard,memory_namespace_get_standard,memory_namespace_set_standard,memory_notify,memory_pending_approve,memory_pending_list,memory_pending_reject,memory_promote,memory_recall,memory_search,memory_session_start,memory_stats,memory_store,memory_subscribe,memory_unsubscribe,memory_update",
"_f5_note": "F5 spawns the ai-memory stdio MCP subprocess using the framework-configured invocation and verifies initialize + tools/list return memory_store, memory_recall, memory_list. Deterministic (no LLM). Gates baseline_pass.",
"tls_mode": "off",
"tls_handshake_f6": true,
"tls_handshake_f6_reason": "",
"mtls_enforcement_f7": true,
"mtls_enforcement_f7_reason": "",
"_f6_f7_note": "F6 verifies the TLS 1.3 handshake against the local serve + CA chain. F7 verifies mTLS enforcement — anonymous client rejected, whitelisted client accepted. Both gate baseline_pass when tls_mode != off / mtls respectively.",
"embedder_loaded_f8": true,
"embedder_loaded_f8_reason": "",
"_f8_note": "F8 verifies /api/v1/capabilities reports features.embedder_loaded=true — i.e. the MiniLM embedder initialised at serve startup. Gates baseline_pass unconditionally. Without this, scenario-18 silently black-holes (semantic recall returns 0 rows).",
"agent_mcp_ai_memory_canary": true,
"canary_uuid": "5389dd0b-00d6-4abf-b7f6-ec63558eba93",
"canary_namespace": "_baseline_canary_f2a"
},
"baseline_pass": false
},
{
"spec_version": "1.4.0",
"agent_type": "ironclaw",
"agent_id": "ai:bob",
"node_index": "2",
"framework_version": "ironclaw 0.27.0",
"ai_memory_version": "0.6.3.1",
"peer_urls": "http://10.10.0.3:9077,http://10.10.0.5:9077,http://10.10.0.4:9077",
"config_file_sha256": "d27582d7ab2c42fc235842210b20ca55bee457b6494794a940e956af974a0a38",
"config_attestation": {
"framework_is_authentic": true,
"mcp_server_ai_memory_registered": true,
"llm_backend_is_xai_grok": true,
"llm_is_default_provider": true,
"mcp_command_is_ai_memory": true,
"agent_id_stamped": true,
"federation_live": true,
"ufw_disabled": true,
"iptables_flushed": true,
"dead_man_switch_scheduled": true
},
"negative_invariants": {
"_description": "Alternative A2A channels must be OFF so a passing scenario is only passing via ai-memory shared memory. Any true here = thesis-preserving.",
"a2a_protocol_off": true,
"sub_agent_or_sessions_spawn_off": true,
"alternative_channels_off": true,
"tool_allowlist_is_memory_only": true,
"a2a_gate_profile_locked": true
},
"functional_probes": {
"xai_grok_chat_reachable": true,
"xai_grok_sample_reply": "READY",
"substrate_http_canary_f2a": true,
"substrate_http_canary_uuid": "bfecbe4a-81be-4aa9-b757-8dc98738b881",
"agent_mcp_canary_f2b": false,
"agent_mcp_canary_uuid": "dd6da821-f975-4819-9ed8-0746684a5f60",
"agent_canary_response_head": "error: unrecognized subcommand 'chat' tip: a similar subcommand exists: 'channels' Usage: ironclaw [OPTIONS] [COMMAND] For more information, try '--help'. ",
"_f2b_note": "F2b is LLM-dependent and non-blocking. F2a (deterministic HTTP substrate) gates baseline_pass.",
"mesh_connectivity_f4": true,
"mesh_edges_ok": 3,
"mesh_edges_total": 3,
"mesh_edges_detail": "10.10.0.3:9077:OK,10.10.0.5:9077:OK,10.10.0.4:9077:OK",
"_f4_note": "F4 verifies this local nodes N-1 OUTBOUND mesh edges to every peer via both GET health and POST sync_push dry_run. Aggregator ANDs across N nodes to confirm full N*(N-1) bidirectional reachability. Gates baseline_pass.",
"ai_memory_mcp_stdio_f5": true,
"ai_memory_mcp_stdio_init_ok": true,
"ai_memory_mcp_stdio_tools_ok": true,
"ai_memory_mcp_stdio_tools_found": "memory_agent_list,memory_agent_register,memory_archive_list,memory_archive_purge,memory_archive_restore,memory_archive_stats,memory_auto_tag,memory_capabilities,memory_check_duplicate,memory_consolidate,memory_delete,memory_detect_contradiction,memory_entity_get_by_alias,memory_entity_register,memory_expand_query,memory_forget,memory_gc,memory_get,memory_get_links,memory_get_taxonomy,memory_inbox,memory_kg_invalidate,memory_kg_query,memory_kg_timeline,memory_link,memory_list,memory_list_subscriptions,memory_namespace_clear_standard,memory_namespace_get_standard,memory_namespace_set_standard,memory_notify,memory_pending_approve,memory_pending_list,memory_pending_reject,memory_promote,memory_recall,memory_search,memory_session_start,memory_stats,memory_store,memory_subscribe,memory_unsubscribe,memory_update",
"_f5_note": "F5 spawns the ai-memory stdio MCP subprocess using the framework-configured invocation and verifies initialize + tools/list return memory_store, memory_recall, memory_list. Deterministic (no LLM). Gates baseline_pass.",
"tls_mode": "off",
"tls_handshake_f6": true,
"tls_handshake_f6_reason": "",
"mtls_enforcement_f7": true,
"mtls_enforcement_f7_reason": "",
"_f6_f7_note": "F6 verifies the TLS 1.3 handshake against the local serve + CA chain. F7 verifies mTLS enforcement — anonymous client rejected, whitelisted client accepted. Both gate baseline_pass when tls_mode != off / mtls respectively.",
"embedder_loaded_f8": true,
"embedder_loaded_f8_reason": "",
"_f8_note": "F8 verifies /api/v1/capabilities reports features.embedder_loaded=true — i.e. the MiniLM embedder initialised at serve startup. Gates baseline_pass unconditionally. Without this, scenario-18 silently black-holes (semantic recall returns 0 rows).",
"agent_mcp_ai_memory_canary": true,
"canary_uuid": "bfecbe4a-81be-4aa9-b757-8dc98738b881",
"canary_namespace": "_baseline_canary_f2a"
},
"baseline_pass": true
},
{
"spec_version": "1.4.0",
"agent_type": "ironclaw",
"agent_id": "ai:charlie",
"node_index": "3",
"framework_version": "ironclaw 0.27.0",
"ai_memory_version": "0.6.3.1",
"peer_urls": "http://10.10.0.3:9077,http://10.10.0.2:9077,http://10.10.0.4:9077",
"config_file_sha256": "56edbd6c5c802b5de7afa1946fab63b22ead4cda86ad001901f31e52aedb3c0a",
"config_attestation": {
"framework_is_authentic": true,
"mcp_server_ai_memory_registered": true,
"llm_backend_is_xai_grok": true,
"llm_is_default_provider": true,
"mcp_command_is_ai_memory": true,
"agent_id_stamped": true,
"federation_live": true,
"ufw_disabled": true,
"iptables_flushed": true,
"dead_man_switch_scheduled": true
},
"negative_invariants": {
"_description": "Alternative A2A channels must be OFF so a passing scenario is only passing via ai-memory shared memory. Any true here = thesis-preserving.",
"a2a_protocol_off": true,
"sub_agent_or_sessions_spawn_off": true,
"alternative_channels_off": true,
"tool_allowlist_is_memory_only": true,
"a2a_gate_profile_locked": true
},
"functional_probes": {
"xai_grok_chat_reachable": true,
"xai_grok_sample_reply": "READY",
"substrate_http_canary_f2a": true,
"substrate_http_canary_uuid": "822d048f-e1f4-4807-9991-3e026e1affc4",
"agent_mcp_canary_f2b": false,
"agent_mcp_canary_uuid": "a8d6972e-ccb4-415a-9358-aa57f750daa4",
"agent_canary_response_head": "error: unrecognized subcommand 'chat' tip: a similar subcommand exists: 'channels' Usage: ironclaw [OPTIONS] [COMMAND] For more information, try '--help'. ",
"_f2b_note": "F2b is LLM-dependent and non-blocking. F2a (deterministic HTTP substrate) gates baseline_pass.",
"mesh_connectivity_f4": true,
"mesh_edges_ok": 3,
"mesh_edges_total": 3,
"mesh_edges_detail": "10.10.0.3:9077:OK,10.10.0.2:9077:OK,10.10.0.4:9077:OK",
"_f4_note": "F4 verifies this local nodes N-1 OUTBOUND mesh edges to every peer via both GET health and POST sync_push dry_run. Aggregator ANDs across N nodes to confirm full N*(N-1) bidirectional reachability. Gates baseline_pass.",
"ai_memory_mcp_stdio_f5": true,
"ai_memory_mcp_stdio_init_ok": true,
"ai_memory_mcp_stdio_tools_ok": true,
"ai_memory_mcp_stdio_tools_found": "memory_agent_list,memory_agent_register,memory_archive_list,memory_archive_purge,memory_archive_restore,memory_archive_stats,memory_auto_tag,memory_capabilities,memory_check_duplicate,memory_consolidate,memory_delete,memory_detect_contradiction,memory_entity_get_by_alias,memory_entity_register,memory_expand_query,memory_forget,memory_gc,memory_get,memory_get_links,memory_get_taxonomy,memory_inbox,memory_kg_invalidate,memory_kg_query,memory_kg_timeline,memory_link,memory_list,memory_list_subscriptions,memory_namespace_clear_standard,memory_namespace_get_standard,memory_namespace_set_standard,memory_notify,memory_pending_approve,memory_pending_list,memory_pending_reject,memory_promote,memory_recall,memory_search,memory_session_start,memory_stats,memory_store,memory_subscribe,memory_unsubscribe,memory_update",
"_f5_note": "F5 spawns the ai-memory stdio MCP subprocess using the framework-configured invocation and verifies initialize + tools/list return memory_store, memory_recall, memory_list. Deterministic (no LLM). Gates baseline_pass.",
"tls_mode": "off",
"tls_handshake_f6": true,
"tls_handshake_f6_reason": "",
"mtls_enforcement_f7": true,
"mtls_enforcement_f7_reason": "",
"_f6_f7_note": "F6 verifies the TLS 1.3 handshake against the local serve + CA chain. F7 verifies mTLS enforcement — anonymous client rejected, whitelisted client accepted. Both gate baseline_pass when tls_mode != off / mtls respectively.",
"embedder_loaded_f8": true,
"embedder_loaded_f8_reason": "",
"_f8_note": "F8 verifies /api/v1/capabilities reports features.embedder_loaded=true — i.e. the MiniLM embedder initialised at serve startup. Gates baseline_pass unconditionally. Without this, scenario-18 silently black-holes (semantic recall returns 0 rows).",
"agent_mcp_ai_memory_canary": true,
"canary_uuid": "822d048f-e1f4-4807-9991-3e026e1affc4",
"canary_namespace": "_baseline_canary_f2a"
},
"baseline_pass": true
}
],
"failure_mode": "baseline-violation"
}
Run focus
What this campaign tested: This run requested 35 scenarios but recovered none, exercising no transport, framework, or primitive coverage axes.
What it demonstrated: The run proved a critical failure in the testing infrastructure, demonstrating inability to collect any scenario results.
AI NHI analysis · Claude Opus 4.7
FAIL — no scenario reports recovered
The test run didn't work at all, and we got no information from it. We couldn't tell if the AI agents can reliably share memories with each other. The setup needs fixing so future tests can actually check this.
This run failed completely due to harness issues, maintaining current risk posture with no new validation data. Production readiness remains unadvanced, and customer claims on agent memory sharing are unchanged. Versus prior runs, this highlights a new CI reliability regression requiring immediate attention.
The primary failure mode was the complete absence of any scenario reports, despite requesting scenarios like S1, S1b, S2, etc., impacting all intended primitives. No specific failure modes or probes (F#) are available since nothing was recovered. Probable root cause is a bug in the CI harness at sha 3d8d8114968ba04f764121a7ba2180942b9c315e, possibly related to report aggregation or droplet communication in the 4-node mesh.
Debug and patch the CI harness to guarantee scenario report recovery before re-running.