--- summary: "Private QA automation shape for qa-lab, qa-channel, seeded scenarios, and protocol reports" read_when: - Extending qa-lab or qa-channel - Adding repo-backed QA scenarios - Building higher-realism QA automation around the Gateway dashboard title: "QA E2E Automation" --- # QA E2E Automation The private QA stack is meant to exercise OpenClaw in a more realistic, channel-shaped way than a single unit test can. Current pieces: - `extensions/qa-channel`: synthetic message channel with DM, channel, thread, reaction, edit, and delete surfaces. - `extensions/qa-lab`: debugger UI and QA bus for observing the transcript, injecting inbound messages, and exporting a Markdown report. - `qa/`: repo-backed seed assets for the kickoff task and baseline QA scenarios. The current QA operator flow is a two-pane QA site: - Left: Gateway dashboard (Control UI) with the agent. - Right: QA Lab, showing the Slack-ish transcript and scenario plan. Run it with: ```bash pnpm qa:lab:up ``` That builds the QA site, starts the Docker-backed gateway lane, and exposes the QA Lab page where an operator or automation loop can give the agent a QA mission, observe real channel behavior, and record what worked, failed, or stayed blocked. For faster QA Lab UI iteration without rebuilding the Docker image each time, start the stack with a bind-mounted QA Lab bundle: ```bash pnpm openclaw qa docker-build-image pnpm qa:lab:build pnpm qa:lab:up:fast pnpm qa:lab:watch ``` `qa:lab:up:fast` keeps the Docker services on a prebuilt image and bind-mounts `extensions/qa-lab/web/dist` into the `qa-lab` container. `qa:lab:watch` rebuilds that bundle on change, and the browser auto-reloads when the QA Lab asset hash changes. ## Repo-backed seeds Seed assets live in `qa/`: - `qa/scenarios/index.md` - `qa/scenarios/*.md` These are intentionally in git so the QA plan is visible to both humans and the agent. The baseline list should stay broad enough to cover: - DM and channel chat - thread behavior - message action lifecycle - cron callbacks - memory recall - model switching - subagent handoff - repo-reading and docs-reading - one small build task such as Lobster Invaders ## Reporting `qa-lab` exports a Markdown protocol report from the observed bus timeline. The report should answer: - What worked - What failed - What stayed blocked - What follow-up scenarios are worth adding For character and style checks, run the same scenario across multiple live model refs and write a judged Markdown report: ```bash pnpm openclaw qa character-eval \ --model openai/gpt-5.4 \ --model anthropic/claude-opus-4-6 \ --model minimax/MiniMax-M2.7 \ --judge-model openai/gpt-5.4 ``` The command runs local QA gateway child processes, not Docker. It preserves each full transcript, records basic run stats, then asks the judge model in fast mode with `xhigh` reasoning to rank the runs by naturalness, vibe, and humor. When no candidate `--model` is passed, the character eval defaults to `openai/gpt-5.4` and `anthropic/claude-opus-4-6`. ## Related docs - [Testing](/help/testing) - [QA Channel](/channels/qa-channel) - [Dashboard](/web/dashboard)