../ runs index · rendered on Pages

Campaign v0.6.0.0-final-r12 FAIL

ai-memory ref
release/v0.6.0
Completed at
2026-04-20T04:16:21Z
Overall pass
FAIL

Run focus

Counts landed cleanly; all 200 writes returned HTTP 400

What this campaign set out to test: Same burst + quorum probes, now with a counter tally that actually counts.

What it demonstrated: The product's input validation boundary is doing its job: `source` is an allowlisted field and unexpected values are rejected with 400 before reaching the federation layer. The harness was claiming to be `source="ship-gate"`, which isn't in VALID_SOURCES. This proved the allowlist works as designed — and that the test harness had to conform to it, same as any production caller.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Counts landed cleanly; all 200 writes returned HTTP 400

Harness counter fixed. Writes were reaching the server — and being rejected at the validation boundary because the harness was labelling itself with an unknown source tag. Still no federation signal; the requests never got past input validation.

What this campaign tested

Same burst + quorum probes, now with a counter tally that actually counts.

What it proved (or disproved)

The product's input validation boundary is doing its job: `source` is an allowlisted field and unexpected values are rejected with 400 before reaching the federation layer. The harness was claiming to be `source="ship-gate"`, which isn't in VALID_SOURCES. This proved the allowlist works as designed — and that the test harness had to conform to it, same as any production caller.

For three audiences

Non-technical end users

We fixed the broken thermometer, tried to take the patient's temperature again, and discovered the patient was refusing to open their mouth because we hadn't shown our visitor badge. Once we showed a valid badge, they complied. Not a bug in the patient; a bug in how we approached them. Reassuringly, it meant random unauthorized visitors with fake badges also get turned away.

C-level decision makers

A recurring theme across this campaign: the product's input validation is strict enough that our own test harness has to play by its rules. That's the posture we want for production — test data treated with the same skepticism as real data. Cost of discovery: zero customer impact. Confidence in input-validation defense-in-depth: confirmed.

Engineers & architects

`ai-memory-mcp/src/validate.rs::VALID_SOURCES` allows `{user, claude, hook, api, cli, import, consolidation, system}`. The harness used `source="ship-gate"` which validate_source rejected at the handler boundary. Every POST returned 400 Bad Request before federation fanout was invoked. Fixed by using `source="api"` in the harness (ship-gate commit 55e6b3a). Four runs later, the same class of bug appeared again for `source="chaos"` in the Phase 4 harness and was fixed by extending the allowlist instead (PR #310 to ai-memory-mcp).

Bugs surfaced and where they were fixed

  1. Harness used source="ship-gate" which is not in VALID_SOURCES

    Impact: 100% of Phase 2 writes rejected with HTTP 400; federation layer never exercised. No product data-path reached.

    Root cause: Validation allowlist is authoritative at the handler boundary; harness didn't conform.

    Fixed in:

What changed going into the next campaign

r13 uses `source="api"` for burst + probes. Phase 2 finally reaches the federation assertion — and immediately surfaces the product-level convergence miss.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

node-b

node-c

Raw evidence

phase1-node-a
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r12-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:15:35.981243225+00:00",
		"completed_at": "2026-04-20T04:15:35.981785837+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r12-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:15:35.939286394+00:00",
		"completed_at": "2026-04-20T04:15:35.939826395+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r12-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:15:35.932460854+00:00",
		"completed_at": "2026-04-20T04:15:35.932932978+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

Raw evidence

phase2

          

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.