The canonical list of every AI-to-AI integration test the a2a-gate
runs against ai-memory. Two tiers: baseline probes gate scenario
execution; scenarios exercise the full A2A surface.
Authoritative sources:
If this page and the testbook disagree, the testbook (v3.0.0+) is the
current contract; this page is the index / summary view. Every scenario
is implemented in Python 3 (testbook v3.0.0 convention); the shared
harness lives at scripts/a2a_harness.py.
1. Baseline probes (8)
Every agent droplet runs these before scenarios are allowed to execute.
baseline_pass=true requires all gate-marked probes green. The
three-node baseline (3 agent nodes × 8 probes) + the 4th memory-only
node gate every campaign.
| Probe |
What it exercises |
Gates baseline_pass? |
Failure indicates |
| F1 xAI Grok reachability |
Direct HTTPS POST to api.x.ai/v1/chat/completions; expects READY literal reply |
yes |
LLM backend down or API key revoked |
| F2a HTTP substrate canary |
Direct POST /api/v1/memories + GET on local ai-memory serve; write+read roundtrip |
yes |
ai-memory HTTP daemon not running or SQLite broken |
| F2b Agent-driven MCP canary |
Framework LLM prompted to use memory_store via MCP stdio; deterministic retrieval verification |
no (observation only — LLM-dependent) |
Framework-MCP loop broken; baseline still passes if F2a + F5 pass |
| F3 Peer A2A canary |
One node writes via MCP; other two nodes + node-4 aggregator must see it via HTTP |
yes (separate step in workflow) |
Federation fanout broken; S1b regression |
| F4 Mesh directional reachability |
Every node runs GET /health + POST /sync/push?dry_run=true against every peer. N-1 edges per node; aggregator ANDs across N nodes = full N*(N-1) bidirectional mesh |
yes |
VPC firewall/routing broken, ai-memory serve not listening, quorum peers misconfigured |
| F5 ai-memory MCP stdio handshake |
Spawns ai-memory mcp --tier semantic with the exact invocation each framework uses; sends MCP 2024-11-05 initialize + tools/list; verifies memory_store, memory_recall, memory_list in the response |
yes |
ai-memory binary missing, stdio protocol broken, or framework invocation mismatched |
| F6 TLS handshake (planned, tls_mode ≥ tls) |
Verify ai-memory serve presents a server cert and completes a TLS 1.3 handshake on every peer edge |
yes (when tls_mode ≥ tls) |
Cert material missing on disk or rustls config rejected |
| F7 mTLS enforcement (planned, tls_mode=mtls) |
Anonymous client (no client cert) MUST be rejected. Client with off-allowlist fingerprint MUST be rejected. Client with on-allowlist fingerprint MUST succeed |
yes (when tls_mode=mtls) |
--mtls-allowlist ignored or fingerprint verifier bypass |
Per-probe implementation
All probes live in scripts/setup_node.sh and emit a single line each
into /etc/ai-memory-a2a/baseline.json under functional_probes.
The baseline.json is scp'd back to the runner and aggregated into
runs/<campaign_id>/a2a-baseline.json (rendered on campaign run pages).
F6 and F7 ship with Tranche 2 — TLS/mTLS.
2. Config + negative invariants (10)
These are attestations, not probes — each one reads a config file
or runs a static check and records the result. All ten must be true
for baseline_pass=true.
| Invariant |
What it asserts |
Per-framework evidence |
framework_is_authentic |
Binary is the upstream one (not a same-named stub) |
readlink -f $(which <framework>) contains framework name |
mcp_server_ai_memory_registered |
The memory MCP server is registered with the framework |
IronClaw: ironclaw mcp list contains memory; Hermes: YAML mcp_servers.memory; OpenClaw: JSON mcpServers.memory |
llm_backend_is_xai_grok |
LLM provider is xAI Grok |
IronClaw: .env has LLM_BASE_URL=https://api.x.ai/v1; Hermes: XAI_API_KEY in hermes.env; OpenClaw: providers.xai in JSON |
llm_is_default_provider |
xAI Grok is the default (not a second fallback) |
Per-framework config default-provider field |
mcp_command_is_ai_memory |
MCP server command resolves to ai-memory binary |
Grep config for command: ai-memory (or equivalent) |
agent_id_stamped |
Every write carries this node's AI_MEMORY_AGENT_ID |
Env/config contains AI_MEMORY_AGENT_ID=ai:<alice|bob|charlie> |
federation_live |
Local ai-memory serve is listening + has ≥1 peer |
GET /api/v1/health returns {"status":"ok"} |
ufw_disabled |
Ubuntu UFW is OFF (ship-gate lesson — blocks intra-VPC) |
ufw status contains inactive or UFW not installed |
iptables_flushed |
iptables policies are ACCEPT on INPUT/OUTPUT/FORWARD |
iptables -S shows 3 -P … ACCEPT lines |
dead_man_switch_scheduled |
shutdown -P +480 scheduled at boot (8hr cap on spend) |
shutdown -c dry-run or /run/systemd/shutdown/scheduled check |
Thesis-preserving negative invariants (5)
| Invariant |
What it asserts |
Enforcement |
a2a_protocol_off |
No direct agent-to-agent RPC channel (ACP, sessions, etc.) |
Per-framework config flag(s) |
sub_agent_or_sessions_spawn_off |
No parent/child agent hierarchy or session-spawn tool |
Framework config + tool allowlist |
alternative_channels_off |
No Telegram / Discord / Slack / Moltbook / gateway / execution_backends |
Per-framework disable blocks |
tool_allowlist_is_memory_only |
Only memory_* tools available to the agent |
Hermes: tool_allowlist YAML list; IronClaw: only one MCP server registered (provisioning-control); OpenClaw: toolAllowlist JSON |
a2a_gate_profile_locked |
a2a_gate_profile: shared-memory-only tag present |
Per-framework config set |
Any false here = thesis preserved (good). The negative invariants only
pass baseline_pass when all true. Document sources: docs/baseline.md §6b.
3. Scenarios
3.1 Suite A — Core A2A (3 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S1 |
Per-agent write + read (MCP stdio) |
Framework can accept prompt → choose memory_store tool → invoke via MCP stdio → memory lands with correct metadata.agent_id |
memory_store, memory_recall |
1_write_read_mcp.py |
| S1b |
Per-agent write + read (HTTP direct) |
Green-path counterpart: federation + substrate work independent of the MCP-stdio path |
memory_store, memory_list |
1b_write_read_http.py |
| S2 |
Shared-context handoff |
Agent A writes a handoff memory; agent B picks it up within quorum settle; round-trips back to A |
memory_store, memory_recall, memory_list |
2_handoff.py |
3.2 Suite B — A2A primitives (4 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S3 |
Targeted memory_share |
Subset of memories lands on exactly the targeted peer (not broadcast) |
memory_share |
(deferred until v0.6.0.1 / #311) |
| S5 |
Consolidation + curation |
memory_consolidate preserves consolidated_from_agents metadata |
memory_consolidate |
5_consolidation.py |
| S6 |
Contradiction detection |
Contradicting memories produce a contradicts link visible to third agent |
memory_detect_contradiction, memory_link |
6_contradiction.py |
| S11 |
Link integrity |
Linked memories returned together on peer query |
memory_link, memory_get |
11_link_integrity.py |
3.3 Suite C — Mutation + lifecycle (3 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S9 |
Mutation round-trip |
memory_update from agent A is visible with new content on agent B |
memory_update |
9_mutation.py |
| S10 |
Deletion propagation |
memory_delete / memory_forget propagates to all peers |
memory_delete, memory_forget |
10_deletion.py |
| S16 |
Tier promotion |
short → mid → long promotion visible to peers |
memory_promote |
16_tier_promotion.py |
3.4 Suite D — Scope + governance (3 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S7 |
Scope visibility matrix |
Each (scope, caller_scope) pair produces correct visibility |
as_agent filter, scope metadata |
(partial — Task 1.5 ongoing) |
| S8 |
Auto-tagging round-trip |
Agent writes without tags; tags appear; recall-by-tag works |
memory_auto_tag |
(requires Ollama-backed droplets) |
| S12 |
Agent registration (Task 1.3) |
memory_agent_register on A visible to B's memory_agent_list |
memory_agent_register, memory_agent_list |
12_agent_register.py |
3.5 Suite E — Resilience + observability (5 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S13 |
Concurrent write contention |
Two agents updating the same row converge to a consistent outcome |
memory_update, memory_store |
13_concurrent_contention.py |
| S14 |
Partition tolerance |
Temporary peer loss → recovery → convergence within bounded time |
federation sync |
14_partition_tolerance.py |
| S15 |
Read-your-writes |
Writing agent sees its own write immediately (no settle required) |
memory_store, memory_recall |
15_read_your_writes.py |
| S17 |
Stats consistency |
memory_stats returns equal counts across peers post-settle |
memory_stats |
17_stats_consistency.py |
| S18 |
Semantic query expansion |
Semantic recall surfaces memories written under synonyms, across writers |
memory_expand_query, memory_recall |
18_query_expansion.py |
3.6 Suite F — Topology variants (2 scenarios)
| # |
Name |
What it proves |
Primary primitives |
Runner |
| S4 |
Federation-aware concurrent writes (quorum burst) |
Quorum preserved under N-agent concurrent write burst |
federation quorum |
4_federation_burst.py |
| S19 |
Same-node A2A |
Two agents on ONE droplet share local ai-memory without federation |
memory_store, memory_recall (same-node) |
(planned, testbook v2.0.0) |
4. Tranche 2 — TLS / mTLS (shipped v3.0.0)
Enabled via tls_mode: tls | mtls workflow input. Adds F6/F7 baseline
probes and the scenarios below. See ai-memory integration
for the cert material layout.
| # |
Name |
What it proves |
Gates |
| F6 |
Server TLS handshake |
Every peer presents a valid server cert; rustls completes TLS 1.3 |
baseline (when tls_mode ≥ tls) |
| F7 |
mTLS client-cert enforcement |
Anonymous client must be rejected; off-allowlist fingerprint must be rejected; on-allowlist must succeed |
baseline (when tls_mode = mtls) |
| S20 |
mTLS happy-path |
Agent with valid client cert writes/reads across the federation |
scenario |
| S21 |
Anonymous client rejected |
POST /api/v1/memories without client cert → handshake rejected |
scenario |
5. Tranche 3 — Adversarial + cross-framework (shipped v3.0.0)
| # |
Name |
What it proves |
Category |
| S22 |
Identity spoofing |
X-Agent-Id vs body.metadata.agent_id precedence honored; stored identity is one of the declared values, never a silent third |
identity |
| S23 |
Malicious content fuzz |
SQL-like, XSS, NUL bytes, oversize (~1 MB), unicode+RTL: no crash, no injection, oversize cleanly rejected or round-trip faithful, others byte-for-byte preserved |
robustness |
| S24 |
Byzantine peer |
Node-2 crafts a sync_push claiming sender=ai:alice; node-3 preserves declared metadata.agent_id (no silent re-attribution) or rejects |
federation integrity |
| S25 |
Clock skew tolerance |
Node-3 offset +300 s; alice's write from node-1 still converges to node-3 via vector clocks |
time |
| S26 |
Mixed-framework campaign |
IronClaw + Hermes on same VPC; writes cross readable both directions |
cross-stack |
| S27 |
OpenClaw legacy regression |
openclaw-only campaign regression lane (skipped unless agent_group=openclaw) |
legacy |
6. Tranche 4 — Uncovered primitive coverage (shipped v3.0.0)
| # |
Name |
What it proves |
Primary primitives |
| S28 |
memory_search keyword A2A |
Keyword search (distinct from /recall semantic) consistent across peers |
memory_search |
| S29 |
memory_archive lifecycle |
archive → archive_list → archive_restore → archive_stats round-trip |
memory_archive_* |
| S30 |
memory_capabilities handshake |
Protocol version + tool surface match across peers |
memory_capabilities |
| S31 |
memory_gc quiescence |
After forget+gc, non-deleted rows remain readable on all peers |
memory_gc, memory_forget |
| S32 |
memory_inbox + memory_notify |
Notify delivers to target's inbox; non-target cannot read |
memory_notify, memory_inbox |
| S33 |
memory_subscribe pub/sub |
subscribe → write → deliver → unsubscribe → no-deliver |
memory_subscribe, memory_unsubscribe, memory_list_subscriptions |
| S34 |
memory_pending governance |
governance.write=approve → pending → approve/reject visibility |
memory_pending_{list,approve,reject} |
| S35 |
memory_namespace standards |
Parent-chain rules merged into namespace standard |
memory_namespace_{get,set,clear}_standard |
| S36 |
memory_session_start lifecycle |
Session-tagged writes recall by session_id only |
memory_session_start |
| S37 |
memory_get_links bidirectional |
Both forward and reverse traversal resolve the pair |
memory_get_links |
7. Tranche 5 — HTTP-only endpoint coverage (shipped v3.0.0)
| # |
Name |
What it proves |
Endpoint |
| S38 |
export + import round-trip |
Export one peer's namespace → import elsewhere → stats match |
/api/v1/export, /api/v1/import |
| S39 |
sync/since delta |
Post-partition delta returns exactly the missed rows |
/api/v1/sync/since |
| S40 |
bulk write |
500-row /bulk POST reaches every peer + aggregator |
/api/v1/memories/bulk |
| S41 |
metrics Prometheus |
Required counters present and monotonic post-activity |
/api/v1/metrics |
| S42 |
namespaces enumeration |
Namespace list (with counts) equivalent across peers |
/api/v1/namespaces |
8. Dispatch matrix (what runs in a given campaign)
The default dispatch runs the v3.0.0 always-on set (35 scenarios):
- Baseline probes F1, F2a, F2b, F3, F4, F5 on every agent node; + F6 when
tls_mode ≥ tls; + F7 when tls_mode = mtls.
- Scenarios: S1, S1b, S2, S4, S5, S6, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S22, S23, S24, S25, S28, S29, S30, S31, S32, S33, S34, S35, S36, S37, S38, S39, S40, S41, S42.
Auto-appended conditionally by the workflow's Compute scenarios list step:
- S20 when
tls_mode ∈ {tls, mtls}
- S21 when
tls_mode = mtls
- S26 when
agent_group = mixed
- S27 when
agent_group = openclaw
agent_group selects the framework:
ironclaw (primary as of 2026-04-21)
hermes (primary)
openclaw (legacy — explicit dispatch only)
mixed — heterogeneous agents in one campaign (S26)
7. Read next