../ runs index · rendered on Pages

Campaign v0.6.0.0-final-r10 FAIL

ai-memory ref
release/v0.6.0
Completed at
2026-04-20T03:54:37Z
Overall pass
FAIL

Run focus

Baseline under the runner-driven SSH methodology

What this campaign set out to test: The same four phases, reorchestrated: GitHub Actions runner holds the SCP+SSH control plane, droplets hold only workload. Campaign dispatch → Terraform provision → binary-build-on-runner → push-binary-to-droplets → run phases remotely → destroy.

What it demonstrated: Runner-driven orchestration completes in roughly 10 minutes at under $0.15 of DigitalOcean compute per clean run. The cost/time figures now published on the dashboard (~$0.10 / ~15 min) are measurable facts, not aspirational targets. The earlier per-droplet cloud-init methodology cost 6× more.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Baseline under the runner-driven SSH methodology

First run exercising the faster, cheaper orchestration pattern that every campaign since has used. Methodology proven; still no product-level green yet.

What this campaign tested

The same four phases, reorchestrated: GitHub Actions runner holds the SCP+SSH control plane, droplets hold only workload. Campaign dispatch → Terraform provision → binary-build-on-runner → push-binary-to-droplets → run phases remotely → destroy.

What it proved (or disproved)

Runner-driven orchestration completes in roughly 10 minutes at under $0.15 of DigitalOcean compute per clean run. The cost/time figures now published on the dashboard (~$0.10 / ~15 min) are measurable facts, not aspirational targets. The earlier per-droplet cloud-init methodology cost 6× more.

For three audiences

Non-technical end users

The test pipeline got roughly six times faster and six times cheaper — from about an hour per run at ~60¢ to about fifteen minutes at ~10¢. Nothing about WHICH tests we run changed; only the orchestration around them. Faster tests mean faster releases, more releases, more frequent fixes reaching you.

C-level decision makers

Release-gate latency and spend both drop ~6×. At 100 release candidates per year that's ~$50 of direct compute savings AND a shift from multi-hour decision windows to minutes. The real ROI is not the compute line item — it's that a 15-minute release signal invites developers to treat it as a pre-commit check rather than a ceremonial quarterly review.

Engineers & architects

Commit f81bd76 moved SCP/SSH to the runner. Droplets still need local root for SIGKILL / iptables in Phase 4, but per-droplet cargo builds are gone. The Rust toolchain cache (Swatinem/rust-cache@v2) dominates the runner's minute count; binary build drops to ~2 min on cache hit, ~7 min cold.

What changed going into the next campaign

Runs 11 through 14 iterate on Phase 2's multi-agent burst test, ultimately uncovering the silent-data-loss federation fanout bug at the product layer.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

node-b

node-c

Raw evidence

phase1-node-a
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:53.532328794+00:00",
		"completed_at": "2026-04-20T03:53:53.532917887+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:54.869284406+00:00",
		"completed_at": "2026-04-20T03:53:54.869847827+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r10-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T03:53:54.107152666+00:00",
		"completed_at": "2026-04-20T03:53:54.107777486+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

Raw evidence

phase2

          

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.