../ runs index · rendered on Pages

Campaign v0.6.0.0-final-r11 FAIL

ai-memory ref
release/v0.6.0
Completed at
2026-04-20T04:06:13Z
Overall pass
FAIL

Run focus

Phase 2 crashed at line 102 — first diagnostic finding surfaces a harness bug

What this campaign set out to test: Four agent identities (`ai:agent-alice`, `-bob`, `-charlie`, `-dana`) each issuing 50 concurrent POSTs to node-a's `/api/v1/memories` endpoint under `--quorum-writes 2 --quorum-peers <other two>`, then tallying per-agent status codes.

What it demonstrated: The write burst HAPPENED — the code tried to tally responses, meaning traffic reached the server. But the tally itself crashed. What the server actually did with those 200 writes was invisible to the test this run. The run neither proved nor disproved anything about federation.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Phase 2 crashed at line 102 — first diagnostic finding surfaces a harness bug

The Phase 2 shell harness died on an arithmetic syntax error before it ever reached the federation assertion. Harness-only failure; no product signal yet.

What this campaign tested

Four agent identities (`ai:agent-alice`, `-bob`, `-charlie`, `-dana`) each issuing 50 concurrent POSTs to node-a's `/api/v1/memories` endpoint under `--quorum-writes 2 --quorum-peers <other two>`, then tallying per-agent status codes.

What it proved (or disproved)

The write burst HAPPENED — the code tried to tally responses, meaning traffic reached the server. But the tally itself crashed. What the server actually did with those 200 writes was invisible to the test this run. The run neither proved nor disproved anything about federation.

For three audiences

Non-technical end users

We tried to measure the cluster's health, and the measuring device itself malfunctioned. Like a thermometer with a broken display — the patient may or may not have a fever, but we can't tell yet. The fix was to replace the broken thermometer, not to conclude anything about the patient.

C-level decision makers

A test-infrastructure defect blocked the real product assertion. Materially different from a product bug: no customer impact, no release blocker beyond the cost of re-running the campaign. Notable for one reason though — the bug was subtle (a POSIX shell edge case where grep-with-zero-matches emits 0 on stdout AND exits non-zero) and survived review because the fault was latent until a genuinely-passing run hit the zero-match branch. A reminder that harness rigor matters as much as product rigor.

Engineers & architects

scripts/phase2_multiagent.sh line 102 computed `OK=$(grep -c '^201$' codes.txt || echo 0)`. When no line matched, `grep -c` emits `0\n` and exits 1; `|| echo 0` appends another `0`, producing a multi-line "0\n0" string. The subsequent `FAIL=$((TOTAL - OK - QNM))` arithmetic parsed that as invalid syntax. Fixed in commit 625ed66 by switching to `awk '/^201$/{n++} END{print n+0}'` which is idempotent for the empty-match case.

Bugs surfaced and where they were fixed

  1. grep -c + || echo 0 doubled stdout → arithmetic syntax error

    Impact: Phase 2 harness died before producing a verdict. Workflow reported FAIL, but the failure was test-infrastructure, not product.

    Root cause: Pipeline double-counted when grep found zero matches. awk is robust to the empty case because it unconditionally emits `n+0` once at END.

    Fixed in:

What changed going into the next campaign

r12 adopts the awk-based tally. The first cycle that actually reaches the federation assertion will surface the NEXT layer of problems — and it does.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

node-b

node-c

Raw evidence

phase1-node-a
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r11-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:06:01.574280045+00:00",
		"completed_at": "2026-04-20T04:06:01.575046443+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r11-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:06:01.229194260+00:00",
		"completed_at": "2026-04-20T04:06:01.229714144+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r11-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:06:01.117530548+00:00",
		"completed_at": "2026-04-20T04:06:01.118120527+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

Raw evidence

phase2

          

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.