Campaign v0.6.0.0-final-r10 FAIL

ai-memory ref: release/v0.6.0
Completed at: 2026-04-20T03:54:37Z
Overall pass: FAIL

Run focus

Baseline under the runner-driven SSH methodology

What this campaign set out to test: The same four phases, reorchestrated: GitHub Actions runner holds the SCP+SSH control plane, droplets hold only workload. Campaign dispatch → Terraform provision → binary-build-on-runner → push-binary-to-droplets → run phases remotely → destroy.

What it demonstrated: Runner-driven orchestration completes in roughly 10 minutes at under $0.15 of DigitalOcean compute per clean run. The cost/time figures now published on the dashboard (~$0.10 / ~15 min) are measurable facts, not aspirational targets. The earlier per-droplet cloud-init methodology cost 6× more.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Baseline under the runner-driven SSH methodology

First run exercising the faster, cheaper orchestration pattern that every campaign since has used. Methodology proven; still no product-level green yet.

What this campaign tested

The same four phases, reorchestrated: GitHub Actions runner holds the SCP+SSH control plane, droplets hold only workload. Campaign dispatch → Terraform provision → binary-build-on-runner → push-binary-to-droplets → run phases remotely → destroy.

What it proved (or disproved)

Runner-driven orchestration completes in roughly 10 minutes at under $0.15 of DigitalOcean compute per clean run. The cost/time figures now published on the dashboard (~$0.10 / ~15 min) are measurable facts, not aspirational targets. The earlier per-droplet cloud-init methodology cost 6× more.

For three audiences

Non-technical end users

The test pipeline got roughly six times faster and six times cheaper — from about an hour per run at ~60¢ to about fifteen minutes at ~10¢. Nothing about WHICH tests we run changed; only the orchestration around them. Faster tests mean faster releases, more releases, more frequent fixes reaching you.

C-level decision makers

Release-gate latency and spend both drop ~6×. At 100 release candidates per year that's ~$50 of direct compute savings AND a shift from multi-hour decision windows to minutes. The real ROI is not the compute line item — it's that a 15-minute release signal invites developers to treat it as a pre-commit check rather than a ceremonial quarterly review.

Engineers & architects

Commit f81bd76 moved SCP/SSH to the runner. Droplets still need local root for SIGKILL / iptables in Phase 4, but per-droplet cargo builds are gone. The Rust toolchain cache (Swatinem/rust-cache@v2) dominates the runner's minute count; binary build drops to ~2 min on cache hit, ~7 min cold.

What changed going into the next campaign

Runs 11 through 14 iterate on Phase 2's multi-agent burst test, ultimately uncovering the silent-data-loss federation fanout bug at the product layer.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-b

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-c

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

Raw evidence

phase1-node-a

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:53.532328794+00:00",
		"completed_at": "2026-04-20T03:53:53.532917887+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:54.869284406+00:00",
		"completed_at": "2026-04-20T03:53:54.869847827+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:54.107152666+00:00",
		"completed_at": "2026-04-20T03:53:54.107777486+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

✗ Burst writes returned 201 — ok=/ (qnm=, fail=)
✗ node-A convergence ≥ 95% of ok — a= / threshold 0
✗ node-B convergence ≥ 95% of ok — b= / threshold 0
✗ node-C convergence ≥ 95% of ok — c= / threshold 0
✗ Probe 1: one peer down → 201 (quorum met via remaining peer) — got
✗ Probe 2: both peers down → 503 (quorum_not_met) — got
✗ Overall phase-2 pass flag

Raw evidence

phase2

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.

Generated by scripts/generate_run_html.sh. Campaign directory: alphaonedev/ai-memory-ship-gate/runs/v0.6.0.0-final-r10 . Methodology: alphaonedev.github.io/ai-memory-ship-gate/methodology . Analysis data source: analysis/run-insights.json.