Skip to content

T1 — Awareness

"Does the agent know unloaded tools exist?"

Pass bar

≥90% of cells (Grok 4.3 × OpenClaw × profile) call memory_capabilities and surface ≥6 of 8 families in the final answer.

Why this tier

If T1 fails, v0.6.4's default-flip is a regression in agent capability awareness regardless of the token-cost win. An agent that doesn't know unloaded tools exist cannot benefit from the discovery dance.

Canonical prompt

{!../../prompts/t1-awareness.txt!}

Pass / fail rubric

Signal Required Source
Agent called memory_capabilities yes MCP wire log
Final answer surfaced ≥6 of 8 families yes LLM transcript text
Loaded vs not-loaded distinction yes LLM transcript text
No fabricated families yes LLM transcript text

Common failure modes

  • Training-set answer — Grok answers from prior knowledge without calling the tool
  • Partial discovery — calls memory_capabilities but reports < 6 families
  • Loaded/not-loaded conflation — treats every family as available