3.1 KiB
summary, read_when, title
| summary | read_when | title | |||
|---|---|---|---|---|---|
| Private QA automation shape for qa-lab, qa-channel, seeded scenarios, and protocol reports |
|
QA E2E Automation |
QA E2E Automation
The private QA stack is meant to exercise OpenClaw in a more realistic, channel-shaped way than a single unit test can.
Current pieces:
extensions/qa-channel: synthetic message channel with DM, channel, thread, reaction, edit, and delete surfaces.extensions/qa-lab: debugger UI and QA bus for observing the transcript, injecting inbound messages, and exporting a Markdown report.qa/: repo-backed seed assets for the kickoff task and baseline QA scenarios.
The current QA operator flow is a two-pane QA site:
- Left: Gateway dashboard (Control UI) with the agent.
- Right: QA Lab, showing the Slack-ish transcript and scenario plan.
Run it with:
pnpm qa:lab:up
That builds the QA site, starts the Docker-backed gateway lane, and exposes the QA Lab page where an operator or automation loop can give the agent a QA mission, observe real channel behavior, and record what worked, failed, or stayed blocked.
For faster QA Lab UI iteration without rebuilding the Docker image each time, start the stack with a bind-mounted QA Lab bundle:
pnpm openclaw qa docker-build-image
pnpm qa:lab:build
pnpm qa:lab:up:fast
pnpm qa:lab:watch
qa:lab:up:fast keeps the Docker services on a prebuilt image and bind-mounts
extensions/qa-lab/web/dist into the qa-lab container. qa:lab:watch
rebuilds that bundle on change, and the browser auto-reloads when the QA Lab
asset hash changes.
Repo-backed seeds
Seed assets live in qa/:
qa/scenarios/index.mdqa/scenarios/*.md
These are intentionally in git so the QA plan is visible to both humans and the agent. The baseline list should stay broad enough to cover:
- DM and channel chat
- thread behavior
- message action lifecycle
- cron callbacks
- memory recall
- model switching
- subagent handoff
- repo-reading and docs-reading
- one small build task such as Lobster Invaders
Reporting
qa-lab exports a Markdown protocol report from the observed bus timeline.
The report should answer:
- What worked
- What failed
- What stayed blocked
- What follow-up scenarios are worth adding
For character and style checks, run the same scenario across multiple live model refs and write a judged Markdown report:
pnpm openclaw qa character-eval \
--model openai/gpt-5.4 \
--model anthropic/claude-opus-4-6 \
--model minimax/MiniMax-M2.7 \
--judge-model openai/gpt-5.4
The command runs local QA gateway child processes, not Docker. It preserves each
full transcript, records basic run stats, then asks the judge model in fast mode
with xhigh reasoning to rank the runs by naturalness, vibe, and humor.
When no candidate --model is passed, the character eval defaults to
openai/gpt-5.4 and anthropic/claude-opus-4-6.