docs: add Mantis Slack desktop runbook

This commit is contained in:
Peter Steinberger
2026-05-05 23:48:48 +01:00
parent 92b04557a6
commit 430814ebc1
2 changed files with 226 additions and 20 deletions

View File

@@ -0,0 +1,202 @@
---
summary: "Operator runbook for Mantis Slack desktop QA: GitHub dispatch, local CLI, warm VNC leases, hydrate modes, timing interpretation, artifacts, and failure handling."
read_when:
- Running Mantis Slack desktop QA from GitHub or locally
- Debugging slow Mantis Slack desktop runs
- Choosing source, prehydrated, or warm-lease mode
- Posting screenshot and video evidence to a PR
title: "Mantis Slack Desktop Runbook"
---
Mantis Slack desktop QA is the real-UI lane for Slack-class bugs that need a
Linux desktop, VNC rescue, Slack Web, a real OpenClaw gateway, screenshots,
videos, and a PR evidence comment.
Use it when unit tests or the headless Slack live lane cannot prove the bug.
## Storage Model
Mantis uses three different storage layers:
- Provider image: owned by Crabbox and stored in the cloud provider account.
It contains machine capabilities such as Chrome/Chromium, ffmpeg, scrot,
Node/corepack/pnpm, native build tools, and empty cache directories.
- Warm lease state: owned by the current operator session. It can contain a
logged-in browser profile, `/var/cache/crabbox/pnpm`, and a prepared source
checkout while the lease is alive.
- Mantis artifacts: owned by the OpenClaw run. They live under
`.artifacts/qa-e2e/mantis/...`, then GitHub Actions uploads them and the
Mantis GitHub App comments inline evidence on the PR.
Never put secrets, browser cookies, Slack login state, repository checkouts,
`node_modules`, or `dist/` into a prebaked provider image.
## GitHub Dispatch
Run the workflow from `main`:
```bash
gh workflow run mantis-slack-desktop-smoke.yml \
--ref main \
-f candidate_ref=<trusted-ref-or-sha> \
-f pr_number=<pr-number> \
-f scenario_id=slack-canary \
-f crabbox_provider=aws \
-f keep_vm=false \
-f hydrate_mode=source
```
Allowed `candidate_ref` values are intentionally narrow because the workflow
uses live credentials: current `main` ancestry, release tags, or an open PR head
from `openclaw/openclaw`.
The workflow writes:
- uploaded artifact: `mantis-slack-desktop-smoke-<run-id>-<attempt>`;
- inline PR comment from the Mantis GitHub App;
- `slack-desktop-smoke.png`;
- `slack-desktop-smoke.mp4`;
- `slack-desktop-smoke-preview.gif`;
- `slack-desktop-smoke-change.mp4`;
- `mantis-slack-desktop-smoke-summary.json`;
- `mantis-slack-desktop-smoke-report.md`;
- remote logs such as `slack-desktop-command.log`, `openclaw-gateway.log`,
`chrome.log`, and `ffmpeg.log`.
The PR comment is updated in place by the hidden
`<!-- mantis-slack-desktop-smoke -->` marker.
## Local CLI
Cold source proof:
```bash
pnpm openclaw qa mantis slack-desktop-smoke \
--provider aws \
--class standard \
--gateway-setup \
--credential-source convex \
--credential-role maintainer \
--provider-mode live-frontier \
--model openai/gpt-5.4 \
--alt-model openai/gpt-5.4 \
--scenario slack-canary \
--hydrate-mode source
```
Keep the VM for VNC rescue:
```bash
pnpm openclaw qa mantis slack-desktop-smoke \
--provider aws \
--class standard \
--gateway-setup \
--scenario slack-canary \
--keep-lease
```
Open VNC:
```bash
crabbox vnc --provider aws --id <cbx_id> --open
```
Reuse a warm lease:
```bash
pnpm openclaw qa mantis slack-desktop-smoke \
--provider aws \
--lease-id <cbx_id-or-slug> \
--gateway-setup \
--scenario slack-canary \
--hydrate-mode source
```
Use `--hydrate-mode prehydrated` only when the reused remote workspace already
has `node_modules` and a built `dist/`. Mantis fails closed if those are
missing.
## Hydrate Modes
| Mode | Use when | Remote behavior | Tradeoff |
| ------------- | ----------------------------------------- | ------------------------------------------------------------------------------------- | -------------------------------------------------------- |
| `source` | Normal PR proof, cold machines, CI | Runs `pnpm install --frozen-lockfile --prefer-offline` and `pnpm build` inside the VM | Slowest, strongest source-checkout proof |
| `prehydrated` | You intentionally prepared a reused lease | Requires existing `node_modules` and `dist/`; skips install/build | Fast, but only valid for operator-controlled warm leases |
GitHub Actions always prepares the candidate checkout before the VM run. Its
pnpm store is cached by OS, Node version, and lockfile. The VM source run also
uses `/var/cache/crabbox/pnpm` when present.
## Timing Interpretation
`mantis-slack-desktop-smoke-report.md` includes phase timings:
- `crabbox.warmup`: cloud provider boot, desktop/browser readiness, and SSH.
- `crabbox.inspect`: lease metadata lookup.
- `credentials.prepare`: Convex credential lease acquisition.
- `crabbox.remote_run`: sync, browser launch, OpenClaw install/build or
hydrate validation, gateway startup, screenshot, and video capture.
- `artifacts.copy`: rsync back from the VM.
`crabbox.remote_run` can be marked `accepted` when Crabbox returns a non-zero
remote status after Mantis has copied metadata proving that the OpenClaw gateway
is alive and the setup completed. Treat `accepted` as pass-with-explanation,
not a failed scenario.
If the run is slow:
- warmup dominates: prebake or promote a better Crabbox provider image;
- remote_run dominates in `source`: use a warm lease, improve pnpm store reuse,
or move machine prerequisites into the provider image;
- remote_run dominates in `prehydrated`: the remote workspace was not actually
ready, or the gateway/browser/Slack setup is slow;
- artifact copy dominates: inspect video size and artifact directory contents.
## Evidence Checklist
A good PR comment should show:
- scenario id and candidate SHA;
- GitHub Actions run URL;
- artifact URL;
- inline screenshot;
- inline animated preview when available;
- full MP4 and trimmed MP4 links;
- pass/fail status;
- timing summary in the attached report.
Do not commit screenshots or videos into the repository. Keep them in GitHub
Actions artifacts or the PR comment.
## Failure Handling
If the workflow fails before the VM run, inspect the Actions job first. Typical
causes are untrusted `candidate_ref`, missing environment secrets, or candidate
install/build failure.
If the VM run fails but screenshots were copied back, inspect:
```bash
cat mantis-slack-desktop-smoke-report.md
cat mantis-slack-desktop-smoke-summary.json
cat slack-desktop-command.log
cat openclaw-gateway.log
cat chrome.log
cat ffmpeg.log
```
If the run kept the lease, open VNC with the report's `crabbox vnc ...` command.
Stop the lease when done:
```bash
crabbox stop --provider aws <cbx_id-or-slug>
```
If Slack login expired, repair it in VNC on a kept lease and rerun with
`--lease-id`. Do not bake that browser profile into a provider image.
Related docs:
- [QA overview](qa-e2e-automation.md)
- [Slack channel](../channels/slack.md)
- [Testing](../help/testing.md)

View File

@@ -29,26 +29,26 @@ Current pieces:
Every QA flow runs under `pnpm openclaw qa <subcommand>`. Many have `pnpm qa:*`
script aliases; both forms are supported.
| Command | Purpose |
| --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `qa run` | Bundled QA self-check; writes a Markdown report. |
| `qa suite` | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM. |
| `qa coverage` | Print the markdown scenario-coverage inventory (`--json` for machine output). |
| `qa parity-report` | Compare two `qa-suite-summary.json` files and write the agentic parity report. |
| `qa character-eval` | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting). |
| `qa manual` | Run a one-off prompt against the selected provider/model lane. |
| `qa ui` | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`). |
| `qa docker-build-image` | Build the prebaked QA Docker image. |
| `qa docker-scaffold` | Write a docker-compose scaffold for the QA dashboard + gateway lane. |
| `qa up` | Build the QA site, start the Docker-backed stack, print the URL (alias: `pnpm qa:lab:up`; `:fast` variant adds `--use-prebuilt-image --bind-ui-dist --skip-ui-build`). |
| `qa aimock` | Start only the AIMock provider server. |
| `qa mock-openai` | Start only the scenario-aware `mock-openai` provider server. |
| `qa credentials doctor` / `add` / `list` / `remove` | Manage the shared Convex credential pool. |
| `qa matrix` | Live transport lane against a disposable Tuwunel homeserver. See [Matrix QA](/concepts/qa-matrix). |
| `qa telegram` | Live transport lane against a real private Telegram group. |
| `qa discord` | Live transport lane against a real private Discord guild channel. |
| `qa slack` | Live transport lane against a real private Slack channel. |
| `qa mantis` | Before and after verification runner for live transport bugs, with Discord status-reactions evidence, Crabbox desktop/browser smoke, and Slack-in-VNC smoke. See [Mantis](/concepts/mantis). |
| Command | Purpose |
| --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `qa run` | Bundled QA self-check; writes a Markdown report. |
| `qa suite` | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM. |
| `qa coverage` | Print the markdown scenario-coverage inventory (`--json` for machine output). |
| `qa parity-report` | Compare two `qa-suite-summary.json` files and write the agentic parity report. |
| `qa character-eval` | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting). |
| `qa manual` | Run a one-off prompt against the selected provider/model lane. |
| `qa ui` | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`). |
| `qa docker-build-image` | Build the prebaked QA Docker image. |
| `qa docker-scaffold` | Write a docker-compose scaffold for the QA dashboard + gateway lane. |
| `qa up` | Build the QA site, start the Docker-backed stack, print the URL (alias: `pnpm qa:lab:up`; `:fast` variant adds `--use-prebuilt-image --bind-ui-dist --skip-ui-build`). |
| `qa aimock` | Start only the AIMock provider server. |
| `qa mock-openai` | Start only the scenario-aware `mock-openai` provider server. |
| `qa credentials doctor` / `add` / `list` / `remove` | Manage the shared Convex credential pool. |
| `qa matrix` | Live transport lane against a disposable Tuwunel homeserver. See [Matrix QA](/concepts/qa-matrix). |
| `qa telegram` | Live transport lane against a real private Telegram group. |
| `qa discord` | Live transport lane against a real private Discord guild channel. |
| `qa slack` | Live transport lane against a real private Slack channel. |
| `qa mantis` | Before and after verification runner for live transport bugs, with Discord status-reactions evidence, Crabbox desktop/browser smoke, and Slack-in-VNC smoke. See [Mantis](/concepts/mantis) and [Mantis Slack Desktop Runbook](/concepts/mantis-slack-desktop-runbook). |
## Operator flow
@@ -149,6 +149,10 @@ With `--gateway-setup`, Mantis leaves a persistent OpenClaw Slack gateway
running inside the VM on port `38973`; without it, the command runs the normal
bot-to-bot Slack QA lane and exits after artifact capture.
The operator checklist, GitHub workflow dispatch command, evidence-comment
contract, hydrate-mode decision table, timing interpretation, and failure
handling steps live in [Mantis Slack Desktop Runbook](/concepts/mantis-slack-desktop-runbook).
For an agent/CV style desktop task, run:
```bash