ai-memory · Test Hub — quality gates aggregated

The three gates

Two existing harnesses + this hub.

The gate repos do the work — provision DigitalOcean infrastructure, run scenarios, capture artifacts. This hub aggregates and presents the results so a release evidence page is one click from the ai-memory homepage.

Ship Gate

Four-phase release testing harness on DigitalOcean. Each phase has its own script and result format.

Phase 1 — Functional smoke (single-node, all MCP tools)
Phase 2 — Multi-agent coordination (federation W of N)
Phase 3 — Migration (cross-version upgrade path)
Phase 4 — Chaos (peer kill, network partition, clock skew)

repo live results

A2A Gate (umbrella spec)

4-node DigitalOcean harness exercising ai-memory through IronClaw / Hermes / OpenClaw agents communicating agent-to-agent. The umbrella keeps the specification (testbook, scenario contracts, v1-GA criteria); per-release execution lives in ai-memory-a2a-v<version> repos.

v0.6.2 certified — 3/3 consecutive green across 6 cells
Matrix: ironclaw + hermes + openclaw × off / TLS / mTLS
Per-scenario JSON + stderr + provenance trace
Mutation, Byzantine peer, clock skew, identity spoofing
v0.6.3.1 execution in flight at ai-memory-a2a-v0.6.3.1 (24 scenarios; S9–S22 are v0.6.3.1-specific).

umbrella repo umbrella results v0.6.3.1 repo v0.6.3.1 live

This Hub

Per-release evidence pages aggregating the gates above + cross-references. No tests run here — it's the presentation layer.

Per-release verdict (one line)
Phase-by-phase status
Cross-links to gate repos' run artifacts
Machine-readable summary.json per release

repo v0.6.3 evidence

Per-release evidence

Pick a release.

Each release gets one evidence page with the gate-by-gate results and the verdict. The current in-flight release is at the top. The pattern going forward is one ai-memory-a2a-v<version> repo per release; the umbrella ai-memory-ai2ai-gate stays the spec.

v0.6.3.1

▶ testing in flight

Per-release A2A campaign on a 4-node DigitalOcean mesh. Subject under test: ai-memory v0.6.3.1 (tag pinned 2026-04-30, schema v19). Phase ladder 0→5: pre-flight, substrate cert (S1–S24), AI orchestration, autonomous NHI playbook (4 scenarios × 4 arms × n=3 = 48 runs), meta-analysis, verdict. S23 (#507 tilde-expansion) and S24 (#318 MCP stdio fanout) are expected-RED on v0.6.3.1; expected GREEN on Patch 2. Findings funnel into Patch 2 umbrella #511.

live results →

v0.6.3

✓ shipped

Hierarchy + KG + Capabilities v2 + 93.08% coverage. Ship-gate 4 phases pass · a2a-gate 48 scenarios pass · all 5 distribution channels live (GitHub Release · Homebrew tap · ghcr.io · Fedora COPR · crates.io).

view evidence →

v0.6.2

✓ shipped

A2A-CERTIFIED. 214 passing scenarios across 6 cells. Federation fanout correctness + S40 catchup hardening.

a2a-gate runs →

v0.6.1

✓ shipped

SAL Postgres adapter + 5 pre-tag SAL blocker punchlist closed (#293).

a2a-gate runs →

v0.6.0

✓ shipped

16+ ship-gate soak runs (soak-v0.6.0-r1 .. r16+). World-class documentation sprint, SAL track-B PR1.

ship-gate runs →

Testing strategy — parallel + distributed

~6.5h parallel campaigns, not 16h sequential.

Existing harnesses already support per-run isolation. We can fan a release campaign across 17 agents at peak coordinated through ai-memory itself — eating our own dogfood. Two pages explain the architecture:

Parallel testing strategy

What collapses inside each stage. Sequential vs parallel time math, line by line. Constraints, costs ($5-15/campaign), and the two execution options (sequential A vs orchestrated B). Net savings: ~9.5h per campaign.

Read →

Distributed-agent orchestration

The architecture that makes the parallelism work. ai-memory itself as the coordination bus. Workers fan out across DigitalOcean droplets + GitHub Actions runners. Orchestrator drains memory_inbox, updates evidence in real time. Reusable for v0.7+.

Read →

How releases get tested

The two-gate pipeline.

A release tag must pass both gates before it ships. Ship-gate runs first — a regression in any phase blocks the release. A2a-gate runs only after ship-gate phases 1-4 are green; full certification requires the matrix cells the operator selects (full v0.6.2 was 6 cells).

release-rcN tag pushed — operator declares a candidate.
ship-gate Phase 1 (functional) — single-node, all MCP tools exercised. Fast (~30 min).
ship-gate Phase 2 (multi-agent) — federation W of N, peer reconciliation. ~1 hour.
ship-gate Phase 3 (migration) — upgrade path from prior release tag. ~30 min.
ship-gate Phase 4 (chaos) — peer kill, network partition, clock skew. Soak windows for production-quality (3-21 days); abbreviated for fast release. ~2 hours abbreviated.
a2a-gate certification — selected cells (full = 6 cells). ~3-5 hours per cell.
final tag — release pipeline publishes to crates.io / GHCR / Homebrew / Fedora COPR.
post-publish smoke — install from each channel, verify the binary serves memory_capabilities.

Quality gates, aggregated.

Two existing harnesses + this hub.

Ship Gate

A2A Gate (umbrella spec)

This Hub

Pick a release.

~6.5h parallel campaigns, not 16h sequential.

Parallel testing strategy

Distributed-agent orchestration

The two-gate pipeline.