Skip to content

Runs

Per-campaign verdict pages land here. First baseline run will appear under <YYYY-MM-DD>/verdict.md after the v0.6.4 ship window.

Run history

Date Verdict Notes
2026-05-05 GATE YELLOW (driver wired, awaiting key) LLM driver scripts/grok_cell.py landed and validated end-to-end via mock-LLM harness. All four tier scoring paths verified against the real v0.6.4 binary. Certifying real-Grok run fires the moment XAI_API_KEY reaches operator env.
2026-05-04 ✅ HARNESS-PIPELINE GREEN First baseline (broader scope before narrowing). Validates fixture restore + v19→v20 migration + MCP stdio + capabilities parsing.

Per-cell evidence

Each campaign produces one markdown file per (LLM, harness, tier, profile) cell at:

runs/<date>/cells/<llm>-<harness>-<tier>-<profile>.md

Plus a machine-readable verdict.json with the aggregate verdict for the campaign.