Runs¶

Per-campaign verdict pages land here. First baseline run will appear under <YYYY-MM-DD>/verdict.md after the v0.6.4 ship window.

Run history¶

Date	Verdict	Notes
2026-05-05	GATE YELLOW (driver wired, awaiting key)	LLM driver `scripts/grok_cell.py` landed and validated end-to-end via mock-LLM harness. All four tier scoring paths verified against the real v0.6.4 binary. Certifying real-Grok run fires the moment `XAI_API_KEY` reaches operator env.
2026-05-04	✅ HARNESS-PIPELINE GREEN	First baseline (broader scope before narrowing). Validates fixture restore + v19→v20 migration + MCP stdio + capabilities parsing.

Each campaign produces one markdown file per (LLM, harness, tier, profile) cell at:

runs/<date>/cells/<llm>-<harness>-<tier>-<profile>.md

Plus a machine-readable verdict.json with the aggregate verdict for the campaign.