../ runs index · rendered on Pages

Campaign v0.6.0.0-final-r13 FAIL

ai-memory ref
release/v0.6.0
Completed at
2026-04-20T04:27:40Z
Overall pass
FAIL

Run focus

Federation convergence miss (B=89.5%, C=73%) + probe SSH 255 killed script

What this campaign set out to test: 4×50-write burst, 30-second settle, convergence assertion, two quorum probes (one-peer-down → 201, both-peers-down → 503).

What it demonstrated: Disproved that a 30s settle was sufficient for 200 burst writes to reach all three peers under the then-current federation implementation. On node-A (leader) every write landed. On node-B, 179 of 200. On node-C, 146 of 200. This is not an eventual-consistency "slow catch-up" pattern — it's a deterministic "each write is dropped on one peer" pattern. (Run 14 would later confirm this.) Also disproved that a shell-level `|| true` inside quoted remote ssh commands is enough to survive transport-level 255 exits.

Detailed tri-audience analysis is below, followed by per-phase test results for all four phases of the protocol — including any phase that did not run in this campaign.

AI NHI analysis · Claude Opus 4.7

Federation convergence miss (B=89.5%, C=73%) + probe SSH 255 killed script

First run where writes actually exercised federation. And federation was visibly wrong: non-leader peers sat well below the 95% convergence threshold even after 30 seconds of settle. Separately, the probe step exposed a harness-fragility sub-bug.

What this campaign tested

4×50-write burst, 30-second settle, convergence assertion, two quorum probes (one-peer-down → 201, both-peers-down → 503).

What it proved (or disproved)

Disproved that a 30s settle was sufficient for 200 burst writes to reach all three peers under the then-current federation implementation. On node-A (leader) every write landed. On node-B, 179 of 200. On node-C, 146 of 200. This is not an eventual-consistency "slow catch-up" pattern — it's a deterministic "each write is dropped on one peer" pattern. (Run 14 would later confirm this.) Also disproved that a shell-level `|| true` inside quoted remote ssh commands is enough to survive transport-level 255 exits.

For three audiences

Non-technical end users

Two problems at once: the back-of-house syncing wasn't caught up before we measured (could be explained by the test window being too short, or by something worse), and a cleanup step tripped over itself killing the test before it could report. Both turn out to be fixable. The second one is a test-rig issue; the first turns out to be a real bug in the product, uncovered fully in the NEXT run.

C-level decision makers

The campaign now has instrumented visibility into federation correctness. Two gaps identified: (a) test window may be too tight OR product convergence path is slower than expected (to be distinguished by r14), (b) probe harness needs transport-level fault tolerance. Neither is a product regression by itself yet — but the campaign is now exposing signal rather than hiding it. This is what a release-gate is supposed to do.

Engineers & architects

Quorum write (W=2/N=3) guarantees leader + one peer synchronously. Third peer catches up via sync-daemon. 30s was THREE sync-daemon cycles at the then-default 10s cadence. Observed pattern: B=89.5%, C=73% with a convergence ratio that doesn't match a simple settle-time explanation (if it were purely time, we'd expect B and C to be roughly equal, both lagging). Separately, `ssh root@X "pkill -f 'ai-memory serve' || true"` returns SSH exit 255 when the remote sshd closes after pkill kills a process in the same session. `|| true` inside quotes shields the remote exit code; the SSH client's own 255 propagates under `set -e`. Shield fix: wrap the SSH call itself, plus ConnectTimeout + ServerAliveInterval for good measure.

Bugs surfaced and where they were fixed

  1. Phase 2 30s settle insufficient for 200-write convergence (symptom — not root cause)

    Impact: node-B at 89.5%, node-C at 73% — below 95% pass threshold. Initial diagnosis was "settle too short"; real cause emerged in r14.

    Root cause: Quorum's synchronous replication only covers W-1 peers; the third relies on an async path. At the time we believed the async path was slow. It was actually broken.

    Fixed in:

  2. Probe SSH returned 255 and killed Phase 2 under set -e

    Impact: probe1 + probe2 never ran; Phase 2 produced no verdict JSON.

    Root cause: `|| true` inside remote quotes only shields the remote command's exit code, not SSH transport failures.

    Fixed in:

What changed going into the next campaign

r14 bumps the settle to 90s AND wraps probe SSH in outer `|| true`. Convergence stayed below threshold — revealing that the real problem wasn't the test window, it was the product.

Phase 1 — functional (per-node) PASS

What this phase proves: Single-node CRUD, backup, curator dry-run, and MCP handshake on each of the three peer droplets. Establishes that ai-memory starts and is functional at the one-node level before federation is exercised.

Test results

node-a

node-b

node-c

Raw evidence

phase1-node-a
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-a",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.272066563+00:00",
		"completed_at": "2026-04-20T04:26:49.272521117+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-b
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-b",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.581991284+00:00",
		"completed_at": "2026-04-20T04:26:49.582578735+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

phase1-node-c
{
	"phase": 1,
	"host": "aim-v0-6-0-0-final-r13-node-c",
	"version": "ai-memory 0.6.0",
	"pass": true,
	"reasons": [
		""
	],
	"stats": {
		"total": 1,
		"by_tier": [
			{
				"tier": "mid",
				"count": 1
			}
		],
		"by_namespace": [
			{
				"namespace": "ship-gate-phase1",
				"count": 1
			}
		],
		"expiring_soon": 0,
		"links_count": 0,
		"db_size_bytes": 139264
	},
	"curator": {
		"started_at": "2026-04-20T04:26:49.263137618+00:00",
		"completed_at": "2026-04-20T04:26:49.263610536+00:00",
		"cycle_duration_ms": 0,
		"memories_scanned": 1,
		"memories_eligible": 1,
		"auto_tagged": 0,
		"contradictions_found": 0,
		"operations_attempted": 0,
		"operations_skipped_cap": 0,
		"autonomy": {
			"clusters_formed": 0,
			"memories_consolidated": 0,
			"memories_forgotten": 0,
			"priority_adjustments": 0,
			"rollback_entries_written": 0,
			"errors": []
		},
		"errors": [
			"no LLM client configured"
		],
		"dry_run": true
	},
	"mcp_tool_count": 36,
	"recall_count": 1,
	"snapshot_count": 1,
	"manifest_count": 1
}

raw JSON

Phase 2 — multi-agent federation FAIL

What this phase proves: 4 agents × 50 writes against the 3-node federation with W=2 quorum, then 90s settle and convergence count on every peer. Plus two quorum probes (one-peer-down must 201, both-peers-down must 503). Catches silent-data-loss and quorum-misclassification regressions.

Test results

Raw evidence

phase2

          

raw JSON

Phase 3 — cross-backend migration NOT REACHED

What this phase proves: 1000-memory round-trip: SQLite → Postgres, re-run for idempotency, Postgres → SQLite. Asserts zero errors and counts match. Catches migration-correctness regressions in either direction of a production upgrade path.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

Phase 4 — chaos campaign NOT REACHED

What this phase proves: packaging/chaos/run-chaos.sh on the chaos-client droplet with 50 cycles × 100 writes per fault class. Measures convergence_bound = min(count_node1, count_node2) / total_ok. Catches fault-tolerance regressions under SIGKILL of the primary, brief network partition, and related fault models.

This phase did not run because an earlier phase failed and the campaign aborted. Evidence from the phases that did run is above; the protocol would have exercised this phase next if the prior step had passed.

All artifacts

Every JSON committed to this campaign directory. Raw, machine-readable, and stable.