Campaign v0.6.0.0-final-r13 FAIL

ai-memory ref: release/v0.6.0
Completed at: 2026-04-20T04:27:40Z
Overall pass: FAIL

Run focus

Federation convergence miss (B=89.5%, C=73%) + probe SSH 255 killed script

What this campaign set out to test: 4×50-write burst, 30-second settle, convergence assertion, two quorum probes (one-peer-down → 201, both-peers-down → 503).

What it demonstrated: Disproved that a 30s settle was sufficient for 200 burst writes to reach all three peers under the then-current federation implementation. On node-A (leader) every write landed. On node-B, 179 of 200. On node-C, 146 of 200. This is not an eventual-consistency "slow catch-up" pattern — it's a deterministic "each write is dropped on one peer" pattern. (Run 14 would later confirm this.) Also disproved that a shell-level `|| true` inside quoted remote ssh commands is enough to survive transport-level 255 exits.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Federation convergence miss (B=89.5%, C=73%) + probe SSH 255 killed script

First run where writes actually exercised federation. And federation was visibly wrong: non-leader peers sat well below the 95% convergence threshold even after 30 seconds of settle. Separately, the probe step exposed a harness-fragility sub-bug.

What this campaign tested

4×50-write burst, 30-second settle, convergence assertion, two quorum probes (one-peer-down → 201, both-peers-down → 503).

What it proved (or disproved)

Disproved that a 30s settle was sufficient for 200 burst writes to reach all three peers under the then-current federation implementation. On node-A (leader) every write landed. On node-B, 179 of 200. On node-C, 146 of 200. This is not an eventual-consistency "slow catch-up" pattern — it's a deterministic "each write is dropped on one peer" pattern. (Run 14 would later confirm this.) Also disproved that a shell-level `|| true` inside quoted remote ssh commands is enough to survive transport-level 255 exits.

For three audiences

Non-technical end users

Two problems at once: the back-of-house syncing wasn't caught up before we measured (could be explained by the test window being too short, or by something worse), and a cleanup step tripped over itself killing the test before it could report. Both turn out to be fixable. The second one is a test-rig issue; the first turns out to be a real bug in the product, uncovered fully in the NEXT run.

C-level decision makers

The campaign now has instrumented visibility into federation correctness. Two gaps identified: (a) test window may be too tight OR product convergence path is slower than expected (to be distinguished by r14), (b) probe harness needs transport-level fault tolerance. Neither is a product regression by itself yet — but the campaign is now exposing signal rather than hiding it. This is what a release-gate is supposed to do.

Engineers & architects

Quorum write (W=2/N=3) guarantees leader + one peer synchronously. Third peer catches up via sync-daemon. 30s was THREE sync-daemon cycles at the then-default 10s cadence. Observed pattern: B=89.5%, C=73% with a convergence ratio that doesn't match a simple settle-time explanation (if it were purely time, we'd expect B and C to be roughly equal, both lagging). Separately, `ssh root@X "pkill -f 'ai-memory serve' || true"` returns SSH exit 255 when the remote sshd closes after pkill kills a process in the same session. `|| true` inside quotes shields the remote exit code; the SSH client's own 255 propagates under `set -e`. Shield fix: wrap the SSH call itself, plus ConnectTimeout + ServerAliveInterval for good measure.

Bugs surfaced and where they were fixed

Phase 2 30s settle insufficient for 200-write convergence (symptom — not root cause)

Impact: node-B at 89.5%, node-C at 73% — below 95% pass threshold. Initial diagnosis was "settle too short"; real cause emerged in r14.

Root cause: Quorum's synchronous replication only covers W-1 peers; the third relies on an async path. At the time we believed the async path was slow. It was actually broken.

Fixed in:
- ship-gate commit 1a7de54 (90s settle)
- PR #309 (real root-cause fix)
Probe SSH returned 255 and killed Phase 2 under set -e

Impact: probe1 + probe2 never ran; Phase 2 produced no verdict JSON.

Root cause: `|| true` inside remote quotes only shields the remote command's exit code, not SSH transport failures.

Fixed in:
- ship-gate commit 1a7de54

What changed going into the next campaign

r14 bumps the settle to 90s AND wraps probe SSH in outer `|| true`. Convergence stayed below threshold — revealing that the real problem wasn't the test window, it was the product.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-b

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

node-c

✓ Stats total ≥ 1 (store + list + stats round-trip) — 1 memories
✓ Recall returned ≥ 1 hit — 1 hits
✓ Backup snapshot file emitted — 1 snapshot(s)
✓ Backup manifest file emitted — 1 manifest(s)
✓ MCP handshake advertises ≥ 30 tools — 36 tools
✓ Curator dry-run clean (Ollama-not-configured is accepted) — 1 errors
✓ Overall phase-1 pass flag

Raw evidence

phase1-node-a

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.272066563+00:00",
		"completed_at": "2026-04-20T04:26:49.272521117+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.581991284+00:00",
		"completed_at": "2026-04-20T04:26:49.582578735+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c

{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.263137618+00:00",
		"completed_at": "2026-04-20T04:26:49.263610536+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

✗ Burst writes returned 201 — ok=/ (qnm=, fail=)
✗ node-A convergence ≥ 95% of ok — a= / threshold 0
✗ node-B convergence ≥ 95% of ok — b= / threshold 0
✗ node-C convergence ≥ 95% of ok — c= / threshold 0
✗ Probe 1: one peer down → 201 (quorum met via remaining peer) — got
✗ Probe 2: both peers down → 503 (quorum_not_met) — got
✗ Overall phase-2 pass flag

Raw evidence

phase2

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.

Generated by scripts/generate_run_html.sh. Campaign directory: alphaonedev/ai-memory-ship-gate/runs/v0.6.0.0-final-r13 . Methodology: alphaonedev.github.io/ai-memory-ship-gate/methodology . Analysis data source: analysis/run-insights.json.