Scope — v0.6.3.1 First-Principles campaign¶

What this campaign tests, what it deliberately does not test, and what each verdict in releases/v0.6.3.1/summary.json means.

This page is the docsite-rendered companion to repo-root SCOPE.md. The authoritative specification of how the campaign behaves is docs/governance.md — if anything below conflicts with governance.md, governance wins.

Subject under test¶

Field	Value
Tag	`v0.6.3.1` (cut 2026-04-30)
Schema	`v19` (migration ladder `v15` → `v17` → `v18` → `v19`)
Repo	`alphaonedev/ai-memory-mcp`
CLI surfaces added since v0.6.3	`ai-memory boot`, `install <agent>`, `wrap <agent>`, `logs`, `audit verify`, `doctor`
Config surfaces added	`[boot]`, `[logging]`, `[audit]`, `[audit.compliance.{soc2,hipaa,gdpr,fedramp}]`
Tooling claim being verified	1,886 lib tests, 49+ integration tests, 93.84% line coverage, 17 documented integrations × 10 platforms, 7-section doctor health dashboard
Campaign repo	`alphaonedev/ai-memory-a2a-v0.6.3.1`
Verdict surface	`releases/v0.6.3.1/summary.json` (schema v2)
Umbrella issue	`ai-memory-mcp#511`

The doctor surface is the v0.6.3.1 promotion of R7 from ROADMAP2 §7.2. The budget_tokens recall is the recovered commitment R1.

Agent scope¶

In scope (Principle 6 — scope discipline; see governance §2.6):

IronClaw — Rust framework, primary cert agent.
Hermes — counterpart agent for cross-framework substrate cells and Phase 3 NHI playbook.
OpenClaw — first-class third agent framework, in scope for v0.6.3.1 but dispatched separately from a higher-resource workstation via the local Docker mesh (16 GB+ per container — Basic-tier DO droplets used for IronClaw/Hermes don't have enough memory). Per Principle 6, OpenClaw runs produce their own scope-tagged artifacts; the Orchestrator joins all three frameworks via release=v0.6.3.1 linkage but never collapses them across framework boundaries (cross-scope contamination invalidates the artifact).

What carries forward from the v0.6.3 cert¶

Eight base scenarios S1 – S8 from the umbrella testbook run unchanged on the v0.6.3.1 binary:

Scenario	What it asserts
S1	Per-agent write + read
S2	Shared-context handoff
S3	Targeted share
S4	Federation-aware agents
S5	Consolidation + curation
S6	Contradiction detection
S7	Scoping visibility
S8	Auto-tagging

These must remain green. Any regression on S1 – S8 is a release-blocker for the campaign verdict — they are not Patch 2 candidates.

The certification cell is ironclaw / mTLS at the carry-forward sweep. Target: 48 / 48 on that cell.

What is new for v0.6.3.1¶

Sixteen scenarios S9 – S24 exercise the surfaces added in this release:

ID	Surface	Assertion
S9	`ai-memory boot` multi-node	Boot manifest agreement on `version`, `schema_version`, `tier` across 4 nodes.
S10	`ai-memory doctor` cross-node	Storage / Index / Recall / Sync sections agree (in-flight delta tolerated).
S11	`ai-memory install <agent>` recipe handoff	Recipes installed at different nodes share the same store cleanly.
S12	`ai-memory wrap <agent>` cross-vendor	Boot context delivered for codex / gemini / aider / ollama.
S13	`ai-memory audit verify` tamper-evident	Hash chain verifies independently per node; corruption detected.
S14	`ai-memory logs` operator surface	Filters work uniformly; privacy default OFF holds.
S15	R1 `budget_tokens` recall	Same query + budget → same ranked head on federated peers.
S16	Capabilities v2 honesty cross-mesh	Asymmetric capabilities surface, never absorb.
S17	G9 webhook fanout	link / promote / delete / consolidate fire from any originating node.
S18	G4 embedding-dim integrity	Mixed-dim writes refused at boundary.
S19	G5 archive/restore preserves embeddings	Archive at A → restore at B keeps embedding + tier + expiry.
S20	G6 `on_conflict` policy	Concurrent same-key writes deterministic.
S21	G13 endianness magic byte	x86_64 + arm64 nodes exchange f32 BLOBs without silent corruption.
S22	Schema v19 migration	Heterogeneous mesh; warn manifest variant fires on mismatch.
S23	Issue #507 `~` expansion	Tilde in `db` field expanded before SQLite open. Expected RED on v0.6.3.1.
S24	Issue #318 MCP stdio fanout	MCP stdio writes trigger federation fanout. Expected RED on v0.6.3.1.

S23 and S24 are intentionally expected to fail. They correspond to known-open OSS bugs that will close in Patch 2 (v0.6.3.2). Their inclusion is an integrity check on the harness itself: if either ever turned green on v0.6.3.1, the harness would be lying. Both flip to expected-green in the successor ai-memory-a2a-v0.6.3.2 repo.

What is deferred (out of scope)¶

Per governance Appendix B, this campaign deliberately does not cover:

Infrastructure provisioning, mTLS cert management, Terraform. Confirmed-good before Phase 0 by a separate process.
Auto-tagging via Ollama. Opt-in feature requiring s-4vcpu-16gb droplet; deferred per the ai2ai-gate README.
memory_share / #311 targeted-share scenario. Depends on a v0.6.0.1 capability upstream of v0.6.3.1; revisit for v0.6.4.
Patch 2 (v0.6.3.2) regression run. Separate campaign in ai-memory-a2a-v0.6.3.2, using this repo as a template.
v0.7 attestation (Ed25519 signing of memory_links). Covered by the v0.7 cert.
v0.8 typed cognition / CRDTs / curator daemon. Covered by the v0.8 cert.
Performance / scale benchmarking. Owned by the ship-gate, not this correctness campaign.

Deferred work is not a gap in the v0.6.3.1 cert — it is owned by a different per-release cert.

Verdict criteria — two truth-claims, two evidence streams¶

Per Principle 1, the campaign emits two independent verdicts in releases/v0.6.3.1/summary.json (schema v2). They are stored in separate top-level keys (substrate_verdict, nhi_verdict) and never collapsed.

Substrate verdict (Phase 1 — binary, reproducible)¶

Read this for ship / no-ship gating. Computed from S1 – S24 outcomes on the ironclaw / mTLS cert cell.

Value	Meaning
`CERT`	All `S1` – `S22` GREEN; `S23` and `S24` RED as expected; cert cell is 48 / 48. The release passes A2A certification on this tag.
`PARTIAL — pending Patch 2`	Same as `CERT` except `S23` and / or `S24` are still RED. This is the expected v0.6.3.1 outcome — see governance §2.2. The release is conditionally certified pending Patch 2.
`FAIL`	Any other red — including any regression on `S1` – `S8` or any unexpected failure on `S9` – `S22`. Each red opens or updates a `bug` + `v0.6.3.2-candidate` issue parent-linked to #511.

If S23 or S24 flips GREEN, the harness halts the campaign and files a harness-integrity issue. Substrate is not allowed to lie.

NHI verdict (Phase 3 — behavioral, statistical, n=3)¶

Read this to assess whether ai-memory measurably changes NHI behavior under realistic agent workloads. Computed from the four-arm × four-scenario × n=3 = 48-run Phase 3 playbook.

The NHI verdict reports per-scenario treatment effects (Arm-T vs each control arm) and a cross-layer consistency assessment, never collapsed into a single ship/no-ship bit. See governance §6 for the full scenario design and matrix for the cell-by-cell view.

A campaign that ships a green substrate badge but produces no Phase 3 behavioral evidence does not satisfy the v0.6.3.1 cert.

Cross-links¶

Governance (authoritative)
Verdict matrix — substrate cells + Phase 3 arms × scenarios
Findings funnel — defects routed into Patch 2
Operator runbook — phase-by-phase execution
Repo-root SCOPE.md
Repo-root README.md — phase structure overview
Umbrella spec: ai-memory-ai2ai-gate