mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-23 07:01:40 +00:00
107 lines
3.1 KiB
Markdown
107 lines
3.1 KiB
Markdown
---
|
|
summary: "Private QA automation shape for qa-lab, qa-channel, seeded scenarios, and protocol reports"
|
|
read_when:
|
|
- Extending qa-lab or qa-channel
|
|
- Adding repo-backed QA scenarios
|
|
- Building higher-realism QA automation around the Gateway dashboard
|
|
title: "QA E2E Automation"
|
|
---
|
|
|
|
# QA E2E Automation
|
|
|
|
The private QA stack is meant to exercise OpenClaw in a more realistic,
|
|
channel-shaped way than a single unit test can.
|
|
|
|
Current pieces:
|
|
|
|
- `extensions/qa-channel`: synthetic message channel with DM, channel, thread,
|
|
reaction, edit, and delete surfaces.
|
|
- `extensions/qa-lab`: debugger UI and QA bus for observing the transcript,
|
|
injecting inbound messages, and exporting a Markdown report.
|
|
- `qa/`: repo-backed seed assets for the kickoff task and baseline QA
|
|
scenarios.
|
|
|
|
The current QA operator flow is a two-pane QA site:
|
|
|
|
- Left: Gateway dashboard (Control UI) with the agent.
|
|
- Right: QA Lab, showing the Slack-ish transcript and scenario plan.
|
|
|
|
Run it with:
|
|
|
|
```bash
|
|
pnpm qa:lab:up
|
|
```
|
|
|
|
That builds the QA site, starts the Docker-backed gateway lane, and exposes the
|
|
QA Lab page where an operator or automation loop can give the agent a QA
|
|
mission, observe real channel behavior, and record what worked, failed, or
|
|
stayed blocked.
|
|
|
|
For faster QA Lab UI iteration without rebuilding the Docker image each time,
|
|
start the stack with a bind-mounted QA Lab bundle:
|
|
|
|
```bash
|
|
pnpm openclaw qa docker-build-image
|
|
pnpm qa:lab:build
|
|
pnpm qa:lab:up:fast
|
|
pnpm qa:lab:watch
|
|
```
|
|
|
|
`qa:lab:up:fast` keeps the Docker services on a prebuilt image and bind-mounts
|
|
`extensions/qa-lab/web/dist` into the `qa-lab` container. `qa:lab:watch`
|
|
rebuilds that bundle on change, and the browser auto-reloads when the QA Lab
|
|
asset hash changes.
|
|
|
|
## Repo-backed seeds
|
|
|
|
Seed assets live in `qa/`:
|
|
|
|
- `qa/scenarios/index.md`
|
|
- `qa/scenarios/*.md`
|
|
|
|
These are intentionally in git so the QA plan is visible to both humans and the
|
|
agent. The baseline list should stay broad enough to cover:
|
|
|
|
- DM and channel chat
|
|
- thread behavior
|
|
- message action lifecycle
|
|
- cron callbacks
|
|
- memory recall
|
|
- model switching
|
|
- subagent handoff
|
|
- repo-reading and docs-reading
|
|
- one small build task such as Lobster Invaders
|
|
|
|
## Reporting
|
|
|
|
`qa-lab` exports a Markdown protocol report from the observed bus timeline.
|
|
The report should answer:
|
|
|
|
- What worked
|
|
- What failed
|
|
- What stayed blocked
|
|
- What follow-up scenarios are worth adding
|
|
|
|
For character and style checks, run the same scenario across multiple live model
|
|
refs and write a judged Markdown report:
|
|
|
|
```bash
|
|
pnpm openclaw qa character-eval \
|
|
--model openai/gpt-5.4 \
|
|
--model anthropic/claude-opus-4-6 \
|
|
--model minimax/MiniMax-M2.7 \
|
|
--judge-model openai/gpt-5.4
|
|
```
|
|
|
|
The command runs local QA gateway child processes, not Docker. It preserves each
|
|
full transcript, records basic run stats, then asks the judge model in fast mode
|
|
with `xhigh` reasoning to rank the runs by naturalness, vibe, and humor.
|
|
When no candidate `--model` is passed, the character eval defaults to
|
|
`openai/gpt-5.4` and `anthropic/claude-opus-4-6`.
|
|
|
|
## Related docs
|
|
|
|
- [Testing](/help/testing)
|
|
- [QA Channel](/channels/qa-channel)
|
|
- [Dashboard](/web/dashboard)
|