feat(qa): add Mantis desktop browser smoke

2026-05-06 11:30:43 +00:00 · 2026-05-04 01:30:08 +01:00
parent 9c37cfcbdb
commit 57b2d29761
7 changed files with 877 additions and 32 deletions
--- a/docs/concepts/mantis.md
+++ b/docs/concepts/mantis.md
@@ -89,6 +89,27 @@ directory, installs dependencies, builds each ref, runs the scenario with
 and `mantis-report.md`. For the first Discord scenario, a successful verification
 means baseline status is `fail` and candidate status is `pass`.

+The first VM/browser primitive is the desktop smoke:
+
+```bash
+pnpm openclaw qa mantis desktop-browser-smoke \
+  --output-dir .artifacts/qa-e2e/mantis/desktop-browser
+```
+
+It leases or reuses a Crabbox desktop machine, starts a visible browser inside the
+VNC session, captures the desktop, pulls artifacts back to the local output
+directory, and writes the reconnect command into the report. The command defaults
+to the Hetzner provider because it is the first provider with working desktop/VNC
+coverage in the Mantis lane. Override it with `--provider`, `--crabbox-bin`, or
+`OPENCLAW_MANTIS_CRABBOX_PROVIDER` when running against another Crabbox fleet.
+
+Useful desktop smoke flags:
+
+- `--lease-id <cbx_...>` or `OPENCLAW_MANTIS_CRABBOX_LEASE_ID` reuses a warmed desktop.
+- `--browser-url <url>` changes the page opened in the visible browser.
+- `--keep-lease` or `OPENCLAW_MANTIS_KEEP_VM=1` keeps a newly created passing lease open for VNC inspection. Failed runs keep the lease by default when one was created so an operator can reconnect.
+- `--class`, `--idle-timeout`, and `--ttl` tune machine size and lease lifetime.
+
 The GitHub smoke workflow is `Mantis Discord Smoke`. The before and after GitHub
 workflow for the first real scenario is `Mantis Discord Status Reactions`. It
 accepts:
@@ -132,18 +153,19 @@ ClawSweeper review findings.

 1. Acquire credentials.
 2. Allocate or reuse a VM.
-3. Prepare a clean checkout for the baseline ref.
-4. Install dependencies and build only what the scenario needs.
-5. Start a child OpenClaw Gateway with an isolated state directory.
-6. Configure the live transport, provider, model, and browser profile.
-7. Run the scenario and capture baseline evidence.
-8. Stop the gateway and preserve logs.
-9. Prepare the candidate ref in the same VM.
-10. Run the same scenario and capture candidate evidence.
-11. Compare the oracle results and visual evidence.
-12. Write Markdown, JSON, logs, screenshots, and optional trace artifacts.
-13. Upload GitHub Actions artifacts.
-14. Post a concise PR or Discord status message.
+3. Prepare the desktop/browser profile when the scenario needs UI evidence.
+4. Prepare a clean checkout for the baseline ref.
+5. Install dependencies and build only what the scenario needs.
+6. Start a child OpenClaw Gateway with an isolated state directory.
+7. Configure the live transport, provider, model, and browser profile.
+8. Run the scenario and capture baseline evidence.
+9. Stop the gateway and preserve logs.
+10. Prepare the candidate ref in the same VM.
+11. Run the same scenario and capture candidate evidence.
+12. Compare the oracle results and visual evidence.
+13. Write Markdown, JSON, logs, screenshots, and optional trace artifacts.
+14. Upload GitHub Actions artifacts.
+15. Post a concise PR or Discord status message.

 The scenario should be able to fail in two different ways:

--- a/docs/concepts/qa-e2e-automation.md
+++ b/docs/concepts/qa-e2e-automation.md
@@ -29,26 +29,26 @@ Current pieces:
 Every QA flow runs under `pnpm openclaw qa <subcommand>`. Many have `pnpm qa:*`
 script aliases; both forms are supported.

-| Command                                             | Purpose                                                                                                                                                                |
-| --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `qa run`                                            | Bundled QA self-check; writes a Markdown report.                                                                                                                       |
-| `qa suite`                                          | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM.                                 |
-| `qa coverage`                                       | Print the markdown scenario-coverage inventory (`--json` for machine output).                                                                                          |
-| `qa parity-report`                                  | Compare two `qa-suite-summary.json` files and write the agentic parity report.                                                                                         |
-| `qa character-eval`                                 | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting).                                                           |
-| `qa manual`                                         | Run a one-off prompt against the selected provider/model lane.                                                                                                         |
-| `qa ui`                                             | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`).                                                                                                   |
-| `qa docker-build-image`                             | Build the prebaked QA Docker image.                                                                                                                                    |
-| `qa docker-scaffold`                                | Write a docker-compose scaffold for the QA dashboard + gateway lane.                                                                                                   |
-| `qa up`                                             | Build the QA site, start the Docker-backed stack, print the URL (alias: `pnpm qa:lab:up`; `:fast` variant adds `--use-prebuilt-image --bind-ui-dist --skip-ui-build`). |
-| `qa aimock`                                         | Start only the AIMock provider server.                                                                                                                                 |
-| `qa mock-openai`                                    | Start only the scenario-aware `mock-openai` provider server.                                                                                                           |
-| `qa credentials doctor` / `add` / `list` / `remove` | Manage the shared Convex credential pool.                                                                                                                              |
-| `qa matrix`                                         | Live transport lane against a disposable Tuwunel homeserver. See [Matrix QA](/concepts/qa-matrix).                                                                     |
-| `qa telegram`                                       | Live transport lane against a real private Telegram group.                                                                                                             |
-| `qa discord`                                        | Live transport lane against a real private Discord guild channel.                                                                                                      |
-| `qa slack`                                          | Live transport lane against a real private Slack channel.                                                                                                              |
-| `qa mantis`                                         | Before and after verification runner for live transport bugs, with the first Discord status-reactions scenario. See [Mantis](/concepts/mantis).                        |
+| Command                                             | Purpose                                                                                                                                                                   |
+| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `qa run`                                            | Bundled QA self-check; writes a Markdown report.                                                                                                                          |
+| `qa suite`                                          | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM.                                    |
+| `qa coverage`                                       | Print the markdown scenario-coverage inventory (`--json` for machine output).                                                                                             |
+| `qa parity-report`                                  | Compare two `qa-suite-summary.json` files and write the agentic parity report.                                                                                            |
+| `qa character-eval`                                 | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting).                                                              |
+| `qa manual`                                         | Run a one-off prompt against the selected provider/model lane.                                                                                                            |
+| `qa ui`                                             | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`).                                                                                                      |
+| `qa docker-build-image`                             | Build the prebaked QA Docker image.                                                                                                                                       |
+| `qa docker-scaffold`                                | Write a docker-compose scaffold for the QA dashboard + gateway lane.                                                                                                      |
+| `qa up`                                             | Build the QA site, start the Docker-backed stack, print the URL (alias: `pnpm qa:lab:up`; `:fast` variant adds `--use-prebuilt-image --bind-ui-dist --skip-ui-build`).    |
+| `qa aimock`                                         | Start only the AIMock provider server.                                                                                                                                    |
+| `qa mock-openai`                                    | Start only the scenario-aware `mock-openai` provider server.                                                                                                              |
+| `qa credentials doctor` / `add` / `list` / `remove` | Manage the shared Convex credential pool.                                                                                                                                 |
+| `qa matrix`                                         | Live transport lane against a disposable Tuwunel homeserver. See [Matrix QA](/concepts/qa-matrix).                                                                        |
+| `qa telegram`                                       | Live transport lane against a real private Telegram group.                                                                                                                |
+| `qa discord`                                        | Live transport lane against a real private Discord guild channel.                                                                                                         |
+| `qa slack`                                          | Live transport lane against a real private Slack channel.                                                                                                                 |
+| `qa mantis`                                         | Before and after verification runner for live transport bugs, with Discord status-reactions evidence and a Crabbox desktop/browser smoke. See [Mantis](/concepts/mantis). |

 ## Operator flow