ai-memory · Test Hub · Parallel testing strategy

Stage by stage

Where the time goes — and what collapses inside each stage.

1 · Pre-tag prep

2h sequential→1h parallel

Sequential parts (must order): version bump → CHANGELOG promotion → commit. Parallel parts run concurrent on local CPU. Plus overlap: spin DigitalOcean droplets for ship-gate AND author the 6 new a2a scenarios during this stage — both ready the moment pre-tag finishes.

Sub-task	Time	Concurrency
Version bump (Cargo.toml)	5m	blocking
CHANGELOG promotion	10m	blocking
cargo fmt --check	2m	parallel
cargo clippy --pedantic	8m	parallel
cargo test --features sal	15m	parallel
cargo llvm-cov --fail-under-lines 92	25m	parallel (longest)
[overlap] terraform apply ship-gate infra	5m	concurrent
[overlap] author scenarios 43-48	45m	concurrent

2 · Gates — ship-gate phases + a2a-gate scenarios

7-9h sequential→2h parallel

All four ship-gate phases AND the a2a-gate scenarios run on independent DigitalOcean droplets. Existing ship-gate + a2a-gate harnesses already support per-run isolation via separate runs/<run-id>/ output dirs. Wall time = max of any single phase or batch.

Phase / batch	Droplets	Time	Bottleneck
Ship-gate Phase 1 — Functional	1	30m	single-node smoke
Ship-gate Phase 2 — Multi-agent (W=2 of N=3)	3	1h	federation soak burst
Ship-gate Phase 3 — Migration	2	30m	v0.6.2→v0.6.3 path
Ship-gate Phase 4 — Chaos abbreviated	3	2h	longest leg
A2A batch 1 (scenarios 1-12)	1	30m	scenario count
A2A batch 2 (scenarios 13-24)	1	30m	scenario count
A2A batch 3 (scenarios 25-36)	1	30m	scenario count
A2A batch 4 (scenarios 37-48 incl. new 43-48)	1	45m	new scenarios pad it
Total	13	2h	= max(Phase 4, anything)

3 · Tag + release pipeline

2h→2h (already optimal)

GitHub Actions matrix is already parallel: 5 platform builds run simultaneously, then 5 publishing jobs (Homebrew / PPA / COPR / Docker / crates.io) in parallel. The 2h is mostly idle waiting for runners. Bound by cargo build --release per platform — can't compress further without changing the build itself.

Job	Time	Concurrency
Build matrix × 5 platforms	~45m	parallel (already)
Sign + SBOM + upload	~15m	parallel (already)
Publish jobs × 5 channels	~30m	parallel (already)
Pipeline overhead + waits	~30m	idle

4 · Distribution-channel smoke

2h sequential→15m parallel

Four channel-validation agents fire in parallel. Each: pull/install → version check → start daemon → call memory_capabilities, assert schema_version=2. Independent — no shared state.

Channel	Time	Agent runs on
Homebrew	15m	macOS runner
crates.io	15m	Linux runner
Docker GHCR	10m	Linux runner
Fedora COPR	15m	Linux runner (dnf)

5 · Sync to main + evidence publish

1h→30m parallel

Two independent things: merge release branch back to main, and update the test-hub evidence page with the final verdict + per-gate results. Run concurrent.

The math

Sequential vs parallel — line by line.

Time math · v0.6.3-class release campaign

Stage	Sequential	Parallel	Saved
1 · Pre-tag prep (with overlap)	2h	1h	−1h
2 · Ship-gate phases 1-4	4h	2h	−2h
2 · A2A 48 scenarios	5h	45m	−4h 15m
3 · Tag + release pipeline	2h	2h	0
4 · Distribution-channel smoke	2h	15m	−1h 45m
5 · Sync + evidence publish	1h	30m	−30m
Total wall-clock	16h	~6.5h	−9.5h

Note: Stage 2's a2a-gate runs concurrent with ship-gate phases — its 45m wall happens within ship-gate's 2h envelope. Total Stage 2 wall = max(phases, a2a) = 2h. The "−4h 15m" saving for a2a counts what we'd otherwise pay if it ran serial after phases.

Constraints

What the parallelism costs.

Constraint	Impact
DigitalOcean cost	13-17 droplets at peak × ~$0.05-0.10/hr × ~3hr peak = ~$5-15 total campaign. Negligible vs the value of one extra release-day.
Coordination overhead	First time we run distributed orchestration, expect ~1-2h debugging the first phase. Amortizes over future releases — orchestrator scaffolding is reusable for v0.7+.
Terraform provisioning	terraform apply takes ~3-5min per phase. Built into the wall-time math above as part of stage 1 overlap.
Result aggregation	If any agent fails to `memory_notify` the orchestrator, fall back to S3-style polling of the harness's run JSON dump. Existing harnesses already write structured artifacts.
Failure cascading	A failing scenario in batch 1 doesn't kill batch 2 — independent droplets, independent runs. Failures surface in the test-hub evidence page; operator decides whether to re-tag or proceed.
DigitalOcean API rate limits	13 droplets in a single terraform apply: well within limits. Sequential apply per phase if rate-limited.

Two execution options

A or B.

A — Sequential execution. Lower risk. Slower (~16h). Run one phase at a time, catch issues with full attention. Right call when shipping hardware-budget is tight or coordination orchestrator isn't built yet.

B — Parallel execution with orchestrator. Higher upfront cost (~3h orchestrator scaffolding). Then ~6.5h campaign. Net ~9.5h vs ~16h. Right call when shipping speed matters AND we can re-use the orchestrator for v0.7+ campaigns.

For v0.6.3 specifically: orchestrator scaffolding lives in this test-hub repo as a one-time investment. Every future release reuses it. Recommend B.

16-18 hours sequential → ~6.5 hours parallel.

Five stages. Each gates the next.

Where the time goes — and what collapses inside each stage.

Sequential vs parallel — line by line.

Time math · v0.6.3-class release campaign

What the parallelism costs.

A or B.