../ runs index

Campaign a2a-hermes-v0.6.0-r5 FAIL

Agent group
hermes (homogeneous)
ai-memory ref
v0.6.0
Completed at
2026-04-20T23:44:50Z
Overall pass
false
Skipped reports
1

Infrastructure

Provider
?
Region
?
Droplet size
?
Topology
?
Scenarios started
?
Scenarios ended
?
Dispatched by
a2a-gate-bot
Harness SHA
?

Back-filled by scripts/backfill_legacy_runs.sh — historical run predates campaign.meta.json emission.

Run focus

Hermes installs and runs — but `hermes -p` is PROFILE NAME, not prompt. Scenario Phase A rejected every write.

What this campaign tested: End-to-end Hermes provisioning + first Hermes prompt invocation from drive_agent.sh.

What it demonstrated: (1) Nous Research hermes-agent installs cleanly from the official install.sh on a fresh Ubuntu 24.04 DO droplet. (2) The MCP config at ~/.hermes/config.yaml is accepted by the agent process (no YAML schema errors). (3) The grok-CLI assumption that `-p` means prompt was WRONG for Hermes — it’s a global flag for `--profile <name>`. This is the kind of framework-mismatch bug that only surfaces on a real dispatch because the docs said `hermes chat` but drive_agent.sh was calling `hermes -p`.

AI NHI analysis · Claude Opus 4.7

Hermes installs and runs — but `hermes -p` is PROFILE NAME, not prompt. Scenario Phase A rejected every write.

INFRA GREEN / SCENARIO RED — 27 m 34 s wall (longer than openclaw’s 15 m because the hermes-agent Python install is heavier than the single-binary grok). Terraform + SSH + Provision all green. Scenario crashed on first invocation with `Error: Invalid profile name 'Store a memory: ...'. Must match [a-z0-9][a-z0-9_-]{0,63}`. Every write attempt failed identically. Zero memories stored on the Hermes side.

For three audiences

Non-technical end users

Good news: Hermes installed correctly on the cloud server. Bad news: we gave it the wrong kind of instruction. It was waiting for a short username, we gave it a sentence. Every memo got returned before it was even read. Same kind of bug as when a form asks for 'Last name' and you type your full sentence in the box — instant validation error. Quick fix: use the correct command.

C-level decision makers

The install surface works. The invocation bug is one line in a shell driver and costs under a minute to fix; next dispatch should produce the first Hermes-side scenario data. Cost of this detection: under $0.10 of DO compute. No product correctness issue — the memory substrate was never even touched by the Hermes side in this run. Framework-independence story still intact; this is a harness bug, not a substrate bug.

Engineers & architects

Root cause: scripts/drive_agent.sh called `hermes -p "<natural language prompt>"`. Per hermes-agent v0.10.0 CLI reference, `-p` is a GLOBAL flag aliasing `--profile <name>` with strict regex [a-z0-9][a-z0-9_-]{0,63} on the argument. The one-shot prompt flag is `hermes chat -q "<prompt>"` or `-Q` (programmatic/quiet mode). Hermes also supports xAI natively via `--provider xai` (alias grok) so the OpenAI-compatible shim isn't needed; XAI_API_KEY env is sufficient. Fix in this commit: drive_agent.sh now invokes `hermes chat -Q --provider xai --model grok-4-fast-non-reasoning -q "$prompt"`, setup_node.sh writes hermes.env with XAI_API_KEY only (drops OPENAI_API_BASE and friends).

What changes going into the next campaign

Dispatch r6 with correct `hermes chat -q` invocation + native xAI provider; expect Hermes Phase A to succeed and full parity with openclaw’s write path.

Tests performed in this run

Every scenario that produced a JSON report in this campaign, in testbook order. Click a row's scenario id to jump to its full report below. See the Every test performed page for the authoritative catalog.

IDTitleResultReason