Campaign v0.6.0.0-candidate-01 FAIL

ai-memory ref: release/v0.6.0
Completed at: 2026-04-20T02:15:39Z
Overall pass: FAIL

Run focus

Early RC — infrastructure scaffolding validation

What this campaign set out to test: First end-to-end exercise of the four-phase protocol on fresh DigitalOcean droplets against the v0.6.0 candidate branch, using the original per-droplet Terraform-and-cloud-init methodology (before commit f81bd76's runner-driven refactor).

What it demonstrated: The Terraform module successfully provisioned droplets, SSH keys propagated via DigitalOcean's metadata service, cloud-init ran cargo to build ai-memory from source, and the per-phase shell scripts could execute in sequence. It disproved NOTHING about the product — no phase 2/3/4 assertions produced trustworthy data yet.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Early RC — infrastructure scaffolding validation

Infrastructure-shakeout run. Predates the detailed session log and the runner-driven SSH refactor. Informational value only; not a release signal.

What this campaign tested

First end-to-end exercise of the four-phase protocol on fresh DigitalOcean droplets against the v0.6.0 candidate branch, using the original per-droplet Terraform-and-cloud-init methodology (before commit f81bd76's runner-driven refactor).

What it proved (or disproved)

The Terraform module successfully provisioned droplets, SSH keys propagated via DigitalOcean's metadata service, cloud-init ran cargo to build ai-memory from source, and the per-phase shell scripts could execute in sequence. It disproved NOTHING about the product — no phase 2/3/4 assertions produced trustworthy data yet.

For three audiences

Non-technical end users

Imagine a dress rehearsal before opening night. The stage got built, the lights came on, the cast found their marks, the curtain rose on schedule. Whether the play itself is any good is measured by later runs, with a different set of stage directions and critics in the audience.

C-level decision makers

This run establishes that we can provision and orchestrate the ship-gate test infrastructure end-to-end. It does NOT establish that any specific customer-facing behaviour of ai-memory is correct. Treat the PASS marker (if any) as a scaffolding receipt, not a release attestation. The 6× cost+latency reduction we published on later runs compares favourably only because this candidate-era run proves the higher baseline was real.

Engineers & architects

Compare artifacts against post-f81bd76 runs for cost/time deltas (per-droplet boot ~15 min vs runner-driven ~2 min binary build). The candidate-era methodology is archived in git history; no campaigns on release/v0.6.0 have used it since r10.

What changed going into the next campaign

Commit f81bd76 moved orchestration onto the GitHub Actions runner. Droplets become pure workload targets. 60-minute per-droplet budget collapses to 13-15 minutes wall-clock; cost from ~$0.65 to ~$0.10 per run.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-b

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-c

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

Raw evidence

phase1-node-a

{
	"phase": 1,
	"host": "aim-v0-6-0-0-candidate-01-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T02:15:38.013289540+00:00",
		"completed_at": "2026-04-20T02:15:38.013577255+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b

{
	"phase": 1,
	"host": "aim-v0-6-0-0-candidate-01-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T02:15:37.587456310+00:00",
		"completed_at": "2026-04-20T02:15:37.587929262+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c

{
	"phase": 1,
	"host": "aim-v0-6-0-0-candidate-01-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T02:15:37.985124931+00:00",
		"completed_at": "2026-04-20T02:15:37.985534353+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

✗ Burst writes returned 201 — ok=/ (qnm=, fail=)
✗ node-A convergence ≥ 95% of ok — a= / threshold 0
✗ node-B convergence ≥ 95% of ok — b= / threshold 0
✗ node-C convergence ≥ 95% of ok — c= / threshold 0
✗ Probe 1: one peer down → 201 (quorum met via remaining peer) — got
✗ Probe 2: both peers down → 503 (quorum_not_met) — got
✗ Overall phase-2 pass flag

Raw evidence

phase2

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.

Generated by scripts/generate_run_html.sh. Campaign directory: alphaonedev/ai-memory-ship-gate/runs/v0.6.0.0-candidate-01 . Methodology: alphaonedev.github.io/ai-memory-ship-gate/methodology . Analysis data source: analysis/run-insights.json.