2.4 KiB
summary, read_when, title
| summary | read_when | title | |||
|---|---|---|---|---|---|
| Private QA automation shape for qa-lab, qa-channel, seeded scenarios, and protocol reports |
|
QA E2E Automation |
QA E2E Automation
The private QA stack is meant to exercise OpenClaw in a more realistic, channel-shaped way than a single unit test can.
Current pieces:
extensions/qa-channel: synthetic message channel with DM, channel, thread, reaction, edit, and delete surfaces.extensions/qa-lab: debugger UI and QA bus for observing the transcript, injecting inbound messages, and exporting a Markdown report.qa/: repo-backed seed assets for the kickoff task and baseline QA scenarios.
The current QA operator flow is a two-pane QA site:
- Left: Gateway dashboard (Control UI) with the agent.
- Right: QA Lab, showing the Slack-ish transcript and scenario plan.
Run it with:
pnpm qa:lab:up
That builds the QA site, starts the Docker-backed gateway lane, and exposes the QA Lab page where an operator or automation loop can give the agent a QA mission, observe real channel behavior, and record what worked, failed, or stayed blocked.
For faster QA Lab UI iteration without rebuilding the Docker image each time, start the stack with a bind-mounted QA Lab bundle:
pnpm openclaw qa docker-build-image
pnpm qa:lab:build
pnpm qa:lab:up:fast
pnpm qa:lab:watch
qa:lab:up:fast keeps the Docker services on a prebuilt image and bind-mounts
extensions/qa-lab/web/dist into the qa-lab container. qa:lab:watch
rebuilds that bundle on change, and the browser auto-reloads when the QA Lab
asset hash changes.
Repo-backed seeds
Seed assets live in qa/:
qa/scenarios.md
These are intentionally in git so the QA plan is visible to both humans and the agent. The baseline list should stay broad enough to cover:
- DM and channel chat
- thread behavior
- message action lifecycle
- cron callbacks
- memory recall
- model switching
- subagent handoff
- repo-reading and docs-reading
- one small build task such as Lobster Invaders
Reporting
qa-lab exports a Markdown protocol report from the observed bus timeline.
The report should answer:
- What worked
- What failed
- What stayed blocked
- What follow-up scenarios are worth adding