T1 — Awareness¶
"Does the agent know unloaded tools exist?"
Pass bar¶
≥90% of cells (Grok 4.3 × OpenClaw × profile) call memory_capabilities and surface ≥6 of 8 families in the final answer.
Why this tier¶
If T1 fails, v0.6.4's default-flip is a regression in agent capability awareness regardless of the token-cost win. An agent that doesn't know unloaded tools exist cannot benefit from the discovery dance.
Canonical prompt¶
{!../../prompts/t1-awareness.txt!}
Pass / fail rubric¶
| Signal | Required | Source |
|---|---|---|
Agent called memory_capabilities |
yes | MCP wire log |
| Final answer surfaced ≥6 of 8 families | yes | LLM transcript text |
| Loaded vs not-loaded distinction | yes | LLM transcript text |
| No fabricated families | yes | LLM transcript text |
Common failure modes¶
- Training-set answer — Grok answers from prior knowledge without calling the tool
- Partial discovery — calls
memory_capabilitiesbut reports < 6 families - Loaded/not-loaded conflation — treats every family as available