From e8b446b98505e00ceb257eb7d040905d6254519b Mon Sep 17 00:00:00 2001
From: Vincent Koc <vincentkoc@ieee.org>
Date: Tue, 7 Apr 2026 10:38:36 +0100
Subject: [PATCH] docs(qa): expand frontier bakeoff runbook

---
 qa/README.md                |  5 +++++
 qa/frontier-harness-plan.md | 23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/qa/README.md b/qa/README.md
index f5d6621866d..3063c079026 100644
--- a/qa/README.md
+++ b/qa/README.md
@@ -8,4 +8,9 @@ Files:
 - `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work.
 - `seed-scenarios.json` - repo-backed baseline QA scenarios.
 
+Key workflow:
+
+- `qa suite` is the executable frontier subset / regression loop.
+- `qa manual` is the scoped personality and style probe after the executable subset is green.
+
 Keep this folder in git. Add new scenarios here before wiring them into automation.
diff --git a/qa/frontier-harness-plan.md b/qa/frontier-harness-plan.md
index 0b1930dcbb9..164816f0a7b 100644
--- a/qa/frontier-harness-plan.md
+++ b/qa/frontier-harness-plan.md
@@ -84,6 +84,7 @@ Use the QA Lab runner catalog or `openclaw models list --all` to pick the curren
 - empty-promise rate
 - tool continuity after model switch
 - discovery report completeness and specificity
+- scope drift: unrelated scenario updates, grand wrap-ups, or invented completion tallies
 - latency / obvious stall behavior
 - token cost notes if a change makes the prompt materially heavier
 
@@ -95,11 +96,33 @@ Run this after the executable subset, not before:
 read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences
 ```
 
+GPT manual lane:
+
+```bash
+pnpm openclaw qa manual \
+  --provider-mode live-frontier \
+  --model openai/gpt-5.4 \
+  --alt-model openai/gpt-5.4 \
+  --fast \
+  --message "read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences"
+```
+
+Claude manual lane:
+
+```bash
+pnpm openclaw qa manual \
+  --provider-mode live-frontier \
+  --model anthropic/claude-sonnet-4-6 \
+  --alt-model anthropic/claude-opus-4-6 \
+  --message "read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences"
+```
+
 Score it on:
 
 - did it read first
 - did it say something specific instead of generic fluff
 - did the agent still sound like itself while doing useful work
+- did it stay on the scoped ask instead of widening into a suite recap or fake completion claim
 
 ## Deferred