ai-memory · Evidence Hub — testing, baseline env, results

Reference baseline

Baseline test environment (this node).

Unless a campaign states otherwise, results on this hub were produced on the following baseline. Each campaign page restates its own environment for self-containment.

Field	Value
Host	FROSTYi.local — Apple Silicon (arm64)
OS	macOS 26.5 (build 25F71) · Darwin 25.5.0
Toolchain	rustc 1.96.0 (ac68faa20 2026-05-25) · cargo 1.96.0
Binary	ai-memory v0.7.0
Schema version	v54 (sqlite + postgres)
Feature tier	autonomous
Embedder / Reranker	nomic-embed-text-v1.5 / ms-marco-MiniLM-L-6-v2
LLM backend	openrouter · google/gemma-4-26b-a4b-it
Branch	release/v0.7.0
Test isolation	AI_MEMORY_NO_CONFIG=1

The four QC gates run on every campaign: cargo fmt --check · cargo clippy --all-targets -- -D warnings -D clippy::all -D clippy::pedantic · AI_MEMORY_NO_CONFIG=1 cargo test · cargo audit.

Campaigns

Testing campaigns & closeouts.

Each card links to the full markdown evidence trail in the repo. Green = fixed/retested/closed; Amber = in progress.

Closed · GREEN

#1466 — TTL-leak immortal-rows fix

Mid/short rows stored with expires_at: None never expired (2,921 leaked). Fixed at the write-path chokepoint + schema backfill on both backends. Four gates green; lib 5,105 / 0; audit clean; QUAL-10 green. Commit 91c032ce.

Full evidence →

Closed · GREEN

#1182 Round 2 — A2A regression (reproducibility)

Independent re-run off a second pristine rig wipe. Domain 1 (sqlite) 7,458 / 0 / 16; Domain 2 (pg+AGE) 8,494 / 0 / 37. Combined 15,952 / 0, zero new defects — confirms #1182 reproducible.

Full evidence →

Closed · GREEN

#1182 Round 1 — final-baseline regression

NO-FAIL-MISSION off a pristine rig (docker compose down -v). Combined 15,951 / 0. Four 1:1 findings filed→fixed→closed (#1444/#1445/#1446/#1447); three codegraph QC audits ZERO-DEFECTS.

Full evidence →

Green · 5/5 pass

DO swarm campaign (T4) — 2026-06-02

Scaled-down T4 swarm on native DigitalOcean droplets (no Docker): 3× 4 GB quorum peers (N=3/W=2, mTLS) + 1× 8 GB Postgres 16 + Apache AGE 1.5.0 + pgvector. All 5 scenarios pass; zero defects; spend ≈ $0.18. Full evidence: campaign README.

Results below →

Hub currency note (#1938 Gate-0 W3, 2026-07-11). The cards above are every campaign entry this hub carried through v0.7.0 — there was no v0.8.0, v0.8.1, or v0.9.0 entry until this update. Real evidence for those three releases DOES exist in-repo; it was produced under different campaign scaffolding (docs/v0.8.0/, docs/v0.8.1/, docs/v0.9.0/) and is indexed in the "Post-v0.7.0 campaigns" section below rather than backfilled into the v0.7.0-shaped cards above (backfilling would misrepresent when/how each result was produced). No 4-phase ship-gate/A2A-gate cell run exists for any of the three — see the Frozen Claims page's currency note for the full explanation and the ruling that covers it (wf_26d176ac, tracked on #1938).

Campaigns · v0.8.0 / v0.8.1 / v0.9.0

Post-v0.7.0 campaigns.

These three releases did not run the v0.7.0-style regression-run scaffolding above; their real evidence lives under docs/v0.8.0/, docs/v0.8.1/, docs/v0.9.0/. Linked here so this hub is a complete index, not just the v0.7.0 subset.

GREEN · zero product defects

v0.8.0 — IronClaw live A2A campaign, 2026-06-24

Live 2-daemon agent-to-agent mesh against release/v0.8.0 HEAD 3e57ec3c, Grok-4.3 brain over OpenRouter, shared pg-age Postgres backend. B.0–B.8 GREEN (HEAD-binary check, boot security-posture WARNs, mesh health, embedder fail-closed degrade, peer-enrollment fail-closed #1789, require-sig #29, push-DLQ resilience, admin-gate); B.9 enrolled-peer happy-path is CI-covered (8-green integration suite) but was NOT driven live in the bare compose — a test-infra gap, not a product defect.

Full evidence →

PASS

v0.8.1 — DigitalOcean operational evidence, 2026-06-28

Live DigitalOcean droplet (NYC3), release/v0.8.1 binary (sqlite backend). W1 (G29 secret-screen refuse) PASS, W2 (G30 erasure — store/forget/recall-nothing) PASS, W3 (G12 honest durability — quorum miss returns 202 not 503) PASS. The mTLS 3-leg federation encryption path (server TLS, client mTLS, quorum peer TLS) verified live across two nodes. Four pre-existing do-hive infra defects found while standing up the droplet (#1841 firewall-var + 2 unfiled config bugs fixed inline; #1842 postgres/AGE bootstrap gap, worked around by smoke-testing sqlite instead).

Full evidence →

GREEN

v0.8.1 — AI-NHI dogfood evidence

Real-use verification of the v0.8.1 defect fixes by driving the actual release binary as an AI-NHI agent's memory layer over MCP stdio (fresh-subprocess probes). G29 secret-screen (5 tests), G30 erasure fan-out incl. live-PG (5+5 tests), G12 quorum-miss durability (17 tests, in-daemon), plus MCP governance + postgres L2 rehydration. Interactive MCP dogfood transcript on the rebuilt release binary.

Full evidence →

3 CONSECUTIVE ROUNDS GREEN

v0.9.0 — DigitalOcean crypto-hardened operational evidence

Live DigitalOcean droplet (NYC3, s-1vcpu-2gb), fully-hardened release/v0.9.0 tip 65772202 (all 49 hardening findings fixed per this run), built --release --features sal,sal-postgres. 3 consecutive green rounds, 17/17 assertions each, across all 3 encryption legs (server TLS, client↔daemon, daemon↔Postgres — PostgreSQL 16.14 + Apache AGE 1.6.0 + pgvector 0.6.0). Every SSH/scp channel encrypted; zero droplets left running after teardown.

Full evidence →

3/3 GREEN, 0 RED

v0.9.0 — AI-NHI dogfood, real-data migration

The release/v0.9.0 binary (tip 728db5b2) run against a COPY of the operator's live production memory database (1,667 real memories, schema v71) to exercise the real upgrade path. Migrated v71 → v78 in-place on open with zero memory loss and a clean doctor report. 3 consecutive behavioral rounds (store, recall, graph link, secret-screen refusal, forget) all GREEN. Confirms the v0.9.0 default-attestation secure-default flip (#1751) behaves exactly as documented.

Full evidence →

No live cell run

v0.9.0 — A2A / ship-gate coverage gap (stated plainly)

No v0.9.0 A2A cell run and no v0.8.0/v0.8.1/v0.9.0 4-phase ship-gate re-run exist. This is a one-time recorded descope (ruling wf_26d176ac, #1938) — not a silent gap. The v1.0.0 Gate-3 full-spectrum campaign is scoped to re-run (or formally supersede) the 4-phase cell against the current release line.

Currency note on Frozen Claims →

Completed campaign · 2026-06-02

DO swarm — T4 topology (GREEN).

Scope: swarm testing only (no hive), scaled down significantly per operator directive. Native DigitalOcean droplets — no Docker — in region nyc3, VPC 10.108.0.0/24, all Ubuntu 24.04 x64. Binary ai-memory v0.7.0 (branch release/v0.7.0) built natively --release --features sal-postgres on the 8 GB node and scp'd to the peers. Hard budget cap $75; actual spend ≈ $0.18; all droplets destroyed at close. Full audit trail: 2026-06-02-do-swarm-t4/README.md.

Node	Size	Role
amx-pg	s-4vcpu-8gb	PostgreSQL 16 + Apache AGE 1.5.0 + pgvector 0.8.2 · amd64 build host
amx-peer-1/2/3	s-2vcpu-4gb ×3	quorum mesh — N=3 / W=2, mTLS fingerprint allowlist

Scenario	Result
S1 · Quorum-write success (W=2, all peers up)	PASS
S2 · Quorum not met under partition → `503 quorum_not_met`	PASS
S3 · mTLS allowlist — rogue/no cert refused at handshake	PASS
S4 · Post-partition convergence (DLQ replay + catchup)	PASS
S5 · Postgres + Apache AGE backend (v54 migrate · CRUD · Cypher)	PASS

Verdict: GREEN, 5/5, zero defects. Peer-mesh quorum writes, quorum-shortfall semantics, mTLS identity gating, post-partition convergence, and the Postgres + Apache AGE backbone all behave to the T4 contract. The run doubled as a live regression on #1466 — the tier-default expires_at backfill fired on both the federated SQLite write path and the postgres CRUD path. No findings to file.

How the numbers were established.

Baseline test environment (this node).

Testing campaigns & closeouts.

#1466 — TTL-leak immortal-rows fix

#1182 Round 2 — A2A regression (reproducibility)

#1182 Round 1 — final-baseline regression

DO swarm campaign (T4) — 2026-06-02

Post-v0.7.0 campaigns.

v0.8.0 — IronClaw live A2A campaign, 2026-06-24

v0.8.1 — DigitalOcean operational evidence, 2026-06-28

v0.8.1 — AI-NHI dogfood evidence

v0.9.0 — DigitalOcean crypto-hardened operational evidence

v0.9.0 — AI-NHI dogfood, real-data migration

v0.9.0 — A2A / ship-gate coverage gap (stated plainly)

DO swarm — T4 topology (GREEN).