chore(skills): fold testbox notes into crabbox

2026-05-06 10:30:44 +00:00 · 2026-05-03 16:35:40 +01:00
parent 1bfc66da33
commit d286620c03
2 changed files with 243 additions and 467 deletions
--- a/.agents/skills/blacksmith-testbox/SKILL.md
+++ b/.agents/skills/blacksmith-testbox/SKILL.md
@@ -1,420 +0,0 @@
---
-name: blacksmith-testbox
-description: Run Blacksmith Testbox for CI-parity checks, secrets, hosted services, migrations, or builds local cannot reproduce.
---
-
-# Blacksmith Testbox
-
-## Scope
-
-Use Testbox when you need remote CI parity, injected secrets, hosted services,
-or an OS/runtime image that your local machine cannot provide cheaply.
-
-For OpenClaw, Crabbox is a supported alternative when Blacksmith is unavailable
-or owned cloud capacity is preferable.
-
-Do not default to Testbox for every local test/build loop. If the repo has
-documented local commands for normal iteration, use those first so you keep
-warm caches, local build state, and fast feedback.
-
-Testbox is the expensive path. Reach for it deliberately.
-
-OpenClaw maintainers can opt into Testbox-first validation by setting
-`OPENCLAW_TESTBOX=1` in their environment or standing agent rules. This mode is
-maintainers-only and requires Blacksmith access.
-
-When `OPENCLAW_TESTBOX=1` is set in OpenClaw:
-
- Pre-warm a Testbox early for longer, wider, or uncertain work.
- Prefer Testbox for `pnpm` gates, e2e, package-like proof, and broad suites.
- Reuse the same Testbox ID for every run command in the same task/session.
- Use local commands only when the task explicitly sets
-  `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`, or when the user asks for local
-  proof.
-
-## Install the CLI
-
-If `blacksmith` is not installed, install it:
-
-    curl -fsSL https://get.blacksmith.sh | sh
-
-For the canary channel (bleeding-edge):
-
-    BLACKSMITH_CHANNEL=canary sh -c 'curl -fsSL https://get.blacksmith.sh | sh'
-
-Then authenticate:
-
-    blacksmith auth login
-
-## Agent-triggered browser auth (non-interactive)
-
-When an agent needs to ensure the user is authenticated before running testbox
-commands (e.g. warmup, run), use browser-based auth with non-interactive mode.
-This opens the browser for the user to sign in; the agent does not interact with
-the browser. The org selector in the dashboard is skipped, so the user only sees
-the sign-in flow.
-
-**Required command** (`--organization` is required with `--non-interactive`):
-
-    blacksmith auth login --non-interactive --organization <org-slug>
-
-The org slug can come from `BLACKSMITH_ORG` env var or the `--org` global flag.
-If neither is set, the agent should use the project's known org (e.g. from repo
-config or user context). Example:
-
-    blacksmith auth login --non-interactive --organization acme-corp
-    blacksmith --org acme-corp auth login --non-interactive --organization acme-corp
-
-**Flow**: The CLI starts a local callback server, opens the browser to the
-dashboard auth page, and blocks for up to 2 minutes. The user completes sign-in
-and authorization in the browser. The dashboard redirects to localhost with the
-token; the CLI saves credentials and exits. The agent then proceeds.
-
-**Do not use** `--api-token` for this flow — that is for headless/token-based
-auth. This skill focuses on browser-based auth when the user prefers signing in
-via the web UI.
-
-Optional flags:
-
- `--dashboard-url <url>` — Override dashboard URL (e.g. for staging)
-
-## Decide first: local or Testbox
-
-Before warming anything up, check the repo's own instructions.
-
-Prefer local commands when:
-
- the repo documents a supported local test/build workflow
- you are iterating on unit tests, lint, typecheck, formatting, or other
-  local-only validation
- the value comes from warm local caches and fast repeat runs
- the command does not need remote secrets, hosted services, or CI-only images
-
-Prefer Testbox when:
-
- the repo explicitly requires CI-parity or remote validation
- the command needs secrets, service containers, or provisioned infra
- you are reproducing CI-only failures
- you need the exact workflow image/job environment from GitHub Actions
-
-For OpenClaw specifically, normal local iteration stays local unless maintainer
-Testbox mode is enabled with `OPENCLAW_TESTBOX=1`:
-
- `pnpm check:changed`
- `pnpm test:changed`
- `pnpm test <path-or-filter>`
- `pnpm test:serial`
- `pnpm build`
-
-If `OPENCLAW_TESTBOX=1` is enabled, run those same repo commands inside the
-warm Testbox. If the user wants laptop-friendly local proof for one command, use
-the explicit escape hatch `OPENCLAW_LOCAL_CHECK_MODE=throttled`.
-
-For installable-package product proof, prefer the GitHub `Package Acceptance`
-workflow over an ad hoc Testbox command. It resolves one package candidate
-(`source=npm`, `source=ref`, `source=url`, or `source=artifact`), uploads it as
-`package-under-test`, and runs the reusable Docker E2E lanes against that exact
-tarball on GitHub/Blacksmith runners. Use `workflow_ref` for the trusted
-workflow/harness code and `package_ref` for the source ref to pack when testing
-an older trusted branch, tag, or SHA.
-
-## Setup: Warmup before coding
-
-If you decided Testbox is warranted, warm one up early. This returns an ID
-instantly and boots the CI environment in the background while you work:
-
-    blacksmith testbox warmup ci-check-testbox.yml
-    # → tbx_01jkz5b3t9...
-
-Save this ID in the current session. You need it for every `run` command.
-Treat `blacksmith testbox list` as diagnostics, not a reusable work queue.
-Listed boxes can be visible at the org/repo level while still being unusable or
-stale for the current local agent lane.
-
-For OpenClaw maintainer Testbox mode, pre-warm at the start of longer or wider
-tasks:
-
-    blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
-    pnpm testbox:claim --id <ID>
-
-Use the build-artifact warmup when e2e/package/build proof benefits from seeded
-`dist/`, `dist-runtime/`, and build-all caches:
-
-    blacksmith testbox warmup ci-build-artifacts-testbox.yml --ref main --idle-timeout 90
-    pnpm testbox:claim --id <ID>
-
-Warmup dispatches a GitHub Actions workflow that provisions a VM with the
-full CI environment: dependencies installed, services started, secrets
-injected, and a clean checkout of the repo at the default branch.
-
-In OpenClaw, raw commit SHAs are not reliable dispatch refs for `warmup --ref`;
-use a branch or tag. The build-artifact workflow resolves `openclaw@beta` and
-`openclaw@latest` to SHA cache keys internally.
-
-Options:
-
-    --ref <branch|tag>     Git ref to dispatch against (default: repo's default branch)
-    --job <name>           Specific job within the workflow (if it has multiple)
-    --idle-timeout <min>   Idle timeout in minutes (default: 30)
-
-## CRITICAL: Always run from the repo root
-
-ALWAYS invoke `blacksmith testbox` commands from the **root of the git
-repository**. The CLI syncs the current working directory to the testbox
-using rsync with `--delete`. If you run from a subdirectory (e.g.
-`cd backend && blacksmith testbox run ...`), rsync will mirror only that
-subdirectory and **delete everything else** on the testbox — wiping other
-directories like `dashboard/`, `cli/`, etc.
-
-    # CORRECT — run from repo root, use paths in the command
-    blacksmith testbox run --id <ID> "cd backend && php artisan test"
-    blacksmith testbox run --id <ID> "cd dashboard && npm test"
-
-    # WRONG — do NOT cd into a subdirectory before invoking the CLI
-    cd backend && blacksmith testbox run --id <ID> "php artisan test"
-
-If your shell is in a subdirectory, `cd` back to the repo root first:
-
-    cd "$(git rev-parse --show-toplevel)"
-    blacksmith testbox run --id <ID> "cd backend && php artisan test"
-
-## Running commands
-
-    blacksmith testbox run --id <ID> "<command>"
-
-The `run` command automatically waits for the testbox to become ready if
-it is still booting, so you can call `run` immediately after warmup without
-needing to check status first.
-
-In OpenClaw, prefer the guarded runner wrapper so stale/reused ids fail before
-the Blacksmith CLI spends time syncing or emits a confusing missing-key error:
-
-    pnpm testbox:run --id <ID> -- "OPENCLAW_TESTBOX=1 pnpm check:changed"
-
-The wrapper refuses to run when the local per-Testbox key is missing or when the
-id was not claimed by this OpenClaw checkout with `pnpm testbox:claim --id
-<ID>`. Treat that as the expected remediation, not as a GitHub account or
-normal SSH-key problem. A local key alone is not enough; a ready box may still
-carry stale rsync state from another lane.
-
-If the agent crashes, the remote box relies on Blacksmith's idle timeout. The
-local OpenClaw claim marker is not deleted automatically, so the wrapper treats
-claims older than 12 hours as stale. Override only for intentional long-running
-work with `OPENCLAW_TESTBOX_CLAIM_TTL_MINUTES=<minutes>`.
-
-Before spending a broad gate on a manually assembled command, you can also run:
-
-    pnpm testbox:sanity -- --id <ID>
-
-## Downloading files from a testbox
-
-Use the `download` command to retrieve files or directories from a running
-testbox to your local machine. This is useful for fetching build artifacts,
-test results, coverage reports, or any output generated on the testbox.
-
-    blacksmith testbox download --id <ID> <remote-path> [local-path]
-
-The remote path is relative to the testbox working directory (same as `run`).
-If no local path is specified, the file is saved to the current directory
-using the same base name.
-
-To download a directory, append a trailing `/` to the remote path — this
-triggers recursive mode:
-
-    # Download a single file
-    blacksmith testbox download --id <ID> coverage/report.html
-
-    # Download a file to a specific local path
-    blacksmith testbox download --id <ID> build/output.tar.gz ./output.tar.gz
-
-    # Download an entire directory
-    blacksmith testbox download --id <ID> test-results/ ./results/
-
-Options:
-
-    --ssh-private-key <path>   Path to SSH private key (if warmup used --ssh-public-key)
-
-## How file sync works
-
-Understanding this model is critical for using Testbox correctly.
-
-When you call `run`, the CLI performs a **delta sync** of your local changes
-to the remote testbox before executing your command:
-
-1. The testbox VM starts from a clean `actions/checkout` at the warmup ref.
-   The workflow's setup steps (e.g. `npm install`, `pip install`, `composer install`)
-   run during warmup and populate dependency directories on the remote VM.
-
-2. On each `run`, the CLI uses **git** to detect which files changed locally
-   since the last sync. It syncs ONLY tracked files and untracked non-ignored
-   files (i.e. files that `git ls-files` reports).
-
-3. **`.gitignore`'d directories are never synced.** This means directories
-   like `node_modules/`, `vendor/`, `.venv/`, `build/`, `dist/`, etc. are
-   NOT transferred from your local machine. The testbox uses its own copies
-   of those directories, populated during the warmup workflow steps.
-
-4. If nothing has changed since the last sync (same git commit and working
-   tree state), the sync is skipped entirely for speed.
-
-### Why this matters
-
- **Changing dependencies**: If you modify `package.json`, `requirements.txt`,
-  `composer.json`, `go.mod`, or similar dependency manifests, the lock/manifest
-  file will be synced but the actual dependency directory will NOT. You must
-  re-run the install command on the testbox:
-
-      blacksmith testbox run --id <ID> "npm install && npm test"
-      blacksmith testbox run --id <ID> "pip install -r requirements.txt && pytest"
-      blacksmith testbox run --id <ID> "composer install && phpunit"
-
- **Generated/build artifacts**: If your tests depend on a build step (e.g.
-  `npm run build`, `make`), and you changed source files that affect the build
-  output, re-run the build on the testbox before testing.
-
- **New untracked files**: New files you create locally ARE synced (as long as
-  they are not gitignored). You do not need to `git add` them first.
-
- **Deleted files**: Files you delete locally are also deleted on the remote
-  testbox. The sync model keeps the remote in lockstep with your local managed
-  file set.
-
-## CRITICAL: Do not ban local tests
-
-Do not assume local validation is forbidden. Many repos intentionally invest in
-fast, warm local loops, and forcing every run through Testbox destroys that
-advantage.
-
-Use Testbox for the checks that actually need it: remote parity, secrets,
-services, CI-only runners, or reproducibility against the workflow image.
-
-If the repo says local tests/builds are the normal path, follow the repo.
-
-OpenClaw maintainer exception: if `OPENCLAW_TESTBOX=1` is set by the user or
-agent environment, treat Testbox as the normal validation path for this repo.
-Use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` as the explicit local escape
-hatch.
-
-## When to use
-
-Use Testbox when:
-
- running database migrations or destructive environment checks
- running commands that depend on secrets or environment variables not present locally
- reproducing CI-only failures or validating against the workflow image
- validating behavior that needs provisioned services or remote runners
- doing a final parity check before commit/push when the repo or user wants that
-
-Trim that list based on repo guidance. If the repo documents supported local
-tests/builds, prefer local for routine iteration and keep Testbox for the
-checks that need parity or remote state.
-
-## Workflow
-
-1. Decide whether the repo's local loop is the right default. For OpenClaw,
-   `OPENCLAW_TESTBOX=1` makes Testbox the maintainer default.
-2. If Testbox is warranted, warm up early:
-   `blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90` → save the ID,
-   then `pnpm testbox:claim --id <ID>`
-3. Write code while the testbox boots in the background.
-4. Run the remote command when needed:
-   `pnpm testbox:run --id <ID> -- "OPENCLAW_TESTBOX=1 pnpm check:changed"`
-5. If tests fail, fix code and re-run against the same warm box.
-6. If you changed dependency manifests (package.json, etc.), prepend
-   the install command: `blacksmith testbox run --id <ID> "npm install && npm test"`
-7. If a narrow PR reports a full sync or the box was reused/expired, sanity
-   check the remote copy before a slow gate:
-   `pnpm testbox:run --id <ID> -- "pnpm testbox:sanity"`.
-   If it reports missing root files or mass tracked deletions, stop the box and
-   warm a fresh one. Use `OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1` only for an
-   intentional large deletion PR.
-8. If you need artifacts (coverage reports, build outputs, etc.), download them:
-   `blacksmith testbox download --id <ID> coverage/ ./coverage/`
-9. Once green, commit and push.
-
-## OpenClaw full test suite
-
-For OpenClaw, use the repo package manager and the measured stable full-suite
-profile below. It keeps six Vitest project shards active while limiting each
-shard to one worker to avoid worker OOMs on Testbox:
-
-    blacksmith testbox run --id <ID> "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test"
-
-Observed full-suite time on Blacksmith Testbox is about 3-4 minutes:
-
- 173-180s on a warmed box
- 219s on a fresh 32-vCPU box
-
-When validating before commit/push in maintainer Testbox mode, run
-`pnpm check:changed` inside the warmed box first when appropriate, then the full
-suite with the profile above if broad confidence is needed.
-
-Run `pnpm testbox:sanity` inside the warmed box before the broad command when
-the sync looks suspicious. It checks that root files such as `pnpm-lock.yaml`
-still exist and fails on 200 or more tracked deletions. That catches stale or
-corrupted rsync state before dependency install or Vitest failures hide the real
-problem.
-
-## Examples
-
-    blacksmith testbox warmup ci-check-testbox.yml
-    # → tbx_01jkz5b3t9...
-
-    # Run tests
-    blacksmith testbox run --id <ID> "npm test -- --testPathPattern=handler.test"
-    blacksmith testbox run --id <ID> "go test ./pkg/api/... -run TestHandler -v"
-    blacksmith testbox run --id <ID> "python -m pytest tests/test_api.py -k test_auth"
-
-    # Re-install deps after changing package.json, then test
-    blacksmith testbox run --id <ID> "npm install && npm test"
-
-    # Build and test
-    blacksmith testbox run --id <ID> "npm run build && npm test"
-
-    # Download artifacts from the testbox
-    blacksmith testbox download --id <ID> coverage/lcov-report/ ./coverage/
-    blacksmith testbox download --id <ID> build/output.tar.gz
-
-## Waiting for the testbox to be ready
-
-The `run` command automatically waits for the testbox, so explicit waiting is
-usually unnecessary. If you do need to check readiness separately (e.g. before
-a series of runs), use the `--wait` flag. Do NOT use a sleep-and-recheck loop.
-
-Correct: block until ready with a timeout:
-
-    blacksmith testbox status --id <ID> --wait [--wait-timeout 5m]
-
-Wrong: never use sleep + status in a loop:
-
-    # BAD — do not do this
-    sleep 30 && blacksmith testbox status --id <ID>
-    while ! blacksmith testbox status --id <ID> | grep ready; do sleep 5; done
-
-`--wait` polls the status and exits as soon as the testbox is ready (or when the
-timeout is reached). Default timeout is 5m; use `--wait-timeout` for longer
-(e.g. `10m`, `1h`).
-
-## Managing testboxes
-
-    # Check status of a specific testbox
-    blacksmith testbox status --id <ID>
-
-    # List all active testboxes for the current repo
-    blacksmith testbox list
-
-    # Stop a testbox when you're done (frees resources)
-    blacksmith testbox stop --id <ID>
-
-Testboxes automatically shut down after being idle (default: 30 minutes).
-If you need a longer session, increase the timeout at warmup time. For OpenClaw
-maintainer work, use 90 minutes for long-running sessions:
-
-    blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90
-    blacksmith testbox warmup ci-build-artifacts-testbox.yml --idle-timeout 90
-
-## With options
-
-    blacksmith testbox warmup ci-check-testbox.yml --ref main
-    blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90
-    blacksmith testbox run --id <ID> "go test ./..."
--- a/.agents/skills/crabbox/SKILL.md
+++ b/.agents/skills/crabbox/SKILL.md
@@ -1,54 +1,249 @@
 ---
 name: crabbox
-description: Use Crabbox for OpenClaw remote Linux validation, warmed reusable boxes, GitHub Actions hydration, sync timing, logs, results, caches, and lease cleanup.
+description: Use Crabbox for OpenClaw remote Linux validation. Default to Blacksmith Testbox; includes direct Blacksmith and owned AWS/Hetzner fallback notes when Crabbox fails.
 ---

 # Crabbox

-Use Crabbox when OpenClaw needs remote Linux proof on owned capacity, a large
-runner class, reusable warm state, or a Blacksmith alternative.
+Use Crabbox when OpenClaw needs remote Linux proof for broad tests, CI-parity
+checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable
+boxes, sync timing, logs/results, cache inspection, or lease cleanup.

-## Before Running
+Default backend: `blacksmith-testbox`. The separate `blacksmith-testbox` skill
+has been removed; this skill owns both the normal Crabbox path and the direct
+Blacksmith fallback playbook.
+
+## First Checks

 - Run from the repo root. Crabbox sync mirrors the current checkout.
- Prefer local targeted tests for tight edit loops.
- Prefer Blacksmith Testbox when the task explicitly asks for Blacksmith or a
-  Blacksmith-specific CI comparison.
- Use Crabbox for broad OpenClaw gates when owned AWS/Hetzner capacity is the
-  right remote lane.
- Check `.crabbox.yaml` for repo defaults before adding flags.
- Sanity-check the selected binary before remote work. OpenClaw scripts prefer
-  `../crabbox/bin/crabbox` when present; the user PATH shim can be stale:
-  `command -v crabbox; ../crabbox/bin/crabbox --version; ../crabbox/bin/crabbox --help | sed -n '1,90p'`.
- Install with `brew install openclaw/tap/crabbox`; auth is required before use:
-  `printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`.
- On macOS the user config is `~/Library/Application Support/crabbox/config.yaml`;
-  it must include `broker.url`, `broker.token`, and usually `provider: aws`.
-
-## OpenClaw Flow
-
-AWS/owned-capacity flow for `pnpm` tests:
+- Check the wrapper and providers before remote work:

 ```sh
-pnpm crabbox:warmup -- --idle-timeout 90m
-pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m
-pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
-pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+command -v crabbox
+../crabbox/bin/crabbox --version
+pnpm crabbox:run -- --help | sed -n '1,120p'
 ```

-Blacksmith-backed Crabbox flow can delegate setup to the Testbox workflow:
+- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
+  shim can be stale.
+- Check `.crabbox.yaml` for repo defaults, but override provider explicitly.
+  Even if config still says AWS, maintainer validation should normally pass
+  `--provider blacksmith-testbox`.
+- Prefer local targeted tests for tight edit loops. Broad gates belong remote.
+
+## Default Blacksmith Backend
+
+Use this for `pnpm check`, `pnpm check:changed`, `pnpm test`,
+`pnpm test:changed`, Docker/E2E/live/package gates, or anything likely to fan
+out across many Vitest projects.
+
+Changed gate:

 ```sh
-pnpm crabbox:run -- --provider blacksmith-testbox --blacksmith-org openclaw --blacksmith-workflow .github/workflows/ci-check-testbox.yml --blacksmith-job check --blacksmith-ref main --idle-timeout 90m --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+pnpm crabbox:run -- --provider blacksmith-testbox \
+  --blacksmith-org openclaw \
+  --blacksmith-workflow .github/workflows/ci-check-testbox.yml \
+  --blacksmith-job check \
+  --blacksmith-ref main \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+```
+
+Full suite:
+
+```sh
+pnpm crabbox:run -- --provider blacksmith-testbox \
+  --blacksmith-org openclaw \
+  --blacksmith-workflow .github/workflows/ci-check-testbox.yml \
+  --blacksmith-job check \
+  --blacksmith-ref main \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
+```
+
+Focused rerun:
+
+```sh
+pnpm crabbox:run -- --provider blacksmith-testbox \
+  --blacksmith-org openclaw \
+  --blacksmith-workflow .github/workflows/ci-check-testbox.yml \
+  --blacksmith-job check \
+  --blacksmith-ref main \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
+```
+
+Read the JSON summary. Useful fields:
+
+- `provider`: should be `blacksmith-testbox`
+- `leaseId`: `tbx_...`
+- `syncDelegated`: should be `true`
+- `commandMs` / `totalMs`
+- `exitCode`
+
+Crabbox should stop one-shot Blacksmith Testboxes automatically after the run.
+Verify cleanup when a run fails, is interrupted, or the command output is
+unclear:
+
+```sh
+blacksmith testbox list
+```
+
+## Reuse And Keepalive
+
+For most Blacksmith-backed Crabbox calls, one-shot is enough. Use reuse only
+when you need multiple manual commands on the same hydrated box.
+
+If Crabbox returns a reusable id or you intentionally keep a lease:
+
+```sh
+pnpm crabbox:run -- --provider blacksmith-testbox --id <tbx_id> --no-sync --timing-json --shell -- "pnpm test <path>"
 ```

 Stop boxes you created before handoff:

 ```sh
+pnpm crabbox:stop -- <id-or-slug>
+blacksmith testbox stop --id <tbx_id>
+```
+
+## If Crabbox Fails
+
+Keep the fallback narrow. First decide whether the failure is Crabbox itself,
+Blacksmith/Testbox, repo hydration, sync, or the test command.
+
+Fast checks:
+
+```sh
+command -v crabbox
+../crabbox/bin/crabbox --version
+crabbox run --provider blacksmith-testbox --help | sed -n '1,140p'
+command -v blacksmith
+blacksmith --version
+blacksmith testbox list
+```
+
+Common Crabbox-only failures:
+
+- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling
+  repo, or update/install Crabbox before retrying.
+- Bad local config: pass `--provider blacksmith-testbox` plus explicit
+  `--blacksmith-*` flags instead of relying on `.crabbox.yaml`.
+- Slug/claim confusion: use the raw `tbx_...` id, or run one-shot without
+  `--id`.
+- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the
+  printed Actions URL.
+- Cleanup uncertainty: run `blacksmith testbox list` and stop only boxes you
+  created.
+
+If Crabbox cannot dispatch, sync, attach, or stop but Blacksmith itself works,
+use direct Blacksmith from the repo root:
+
+```sh
+blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
+blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+blacksmith testbox stop --id <tbx_id>
+```
+
+Direct full suite:
+
+```sh
+blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
+```
+
+Auth fallback, only when `blacksmith` says auth is missing:
+
+```sh
+blacksmith auth login --non-interactive --organization openclaw
+```
+
+Raw Blacksmith footguns:
+
+- Run from repo root. The CLI syncs the current directory.
+- Save the returned `tbx_...` id in the session.
+- Reuse that id for focused reruns; stop it before handoff.
+- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
+- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable
+  queue.
+
+Escalate to owned AWS/Hetzner only when Blacksmith is down, quota-limited,
+missing the needed environment, or owned capacity is the explicit goal. Use the
+Owned Cloud Fallback section below.
+
+## Blacksmith Backend Notes
+
+Crabbox Blacksmith backend delegates setup to:
+
+- org: `openclaw`
+- workflow: `.github/workflows/ci-check-testbox.yml`
+- job: `check`
+- ref: `main` unless testing a branch/tag intentionally
+
+The hydration workflow owns checkout, Node/pnpm setup, dependency install,
+secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
+execution, timing, logs/results, and cleanup.
+
+Minimal direct Blacksmith fallback, from repo root:
+
+```sh
+blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
+blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed"
+blacksmith testbox stop --id <tbx_id>
+```
+
+Use direct Blacksmith only when Crabbox is the broken layer and Blacksmith
+itself still works. Prefer direct `blacksmith testbox list` for cleanup
+diagnostics, not as a reusable work queue.
+
+Important Blacksmith footguns:
+
+- Always run from repo root. The CLI syncs the current directory.
+- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
+- If auth is missing and browser auth is acceptable:
+
+```sh
+blacksmith auth login --non-interactive --organization openclaw
+```
+
+## Owned Cloud Fallback
+
+Use AWS/Hetzner only when Blacksmith is down, quota-limited, missing the needed
+environment, or owned capacity is explicitly the goal.
+
+```sh
+pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m
+pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
+pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
 pnpm crabbox:stop -- <cbx_id-or-slug>
 ```

-## Useful Commands
+Install/auth for owned Crabbox if needed:
+
+```sh
+brew install openclaw/tap/crabbox
+printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin
+```
+
+macOS config lives at:
+
+```text
+~/Library/Application Support/crabbox/config.yaml
+```
+
+It should include `broker.url`, `broker.token`, and usually `provider: aws`
+for owned-cloud lanes. Do not let that config override the OpenClaw default
+when Blacksmith proof is requested; pass `--provider blacksmith-testbox`.
+
+## Diagnostics

 ```sh
 crabbox status --id <id-or-slug> --wait
@@ -59,29 +254,30 @@ crabbox logs <run_id>
 crabbox results <run_id>
 crabbox cache stats --id <id-or-slug>
 crabbox ssh --id <id-or-slug>
+blacksmith testbox list
 ```

 Use `--debug` on `run` when measuring sync timing.
-Use `--timing-json` on warmup, hydrate, and run when comparing AWS and
-blacksmith-testbox timings.
-Use `--market spot|on-demand` on AWS warmup or one-shot run when testing quota
-or capacity behavior without changing `.crabbox.yaml`.
+Use `--timing-json` on warmup, hydrate, and run when comparing backends.
+Use `--market spot|on-demand` only on AWS warmup/one-shot runs.

-## Hydration Boundary
+## Failure Triage

-`.github/workflows/crabbox-hydrate.yml` is repo-specific on purpose. It owns
-OpenClaw checkout, setup-node, pnpm setup, provider env hydration, ready marker,
-and keepalive. Crabbox owns runner registration, workflow dispatch, SSH sync,
-command execution, logs/results, local lease claims, and idle cleanup.
+- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists
+  `blacksmith-testbox`; update Crabbox before falling back.
+- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect
+  the hydration step.
+- Sync failed: rerun with `--debug`; check changed-file count and whether the
+  checkout is dirty.
+- Command failed: rerun only the failing shard/file first. Do not rerun a full
+  suite until the focused failure is understood.
+- Cleanup uncertain: `blacksmith testbox list`; stop owned `tbx_...` leases you
+  created.
+- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above,
+  then file/fix the Crabbox issue.

-Do not add OpenClaw-specific setup to Crabbox. Put repo setup in the hydration
-workflow and generic lease/sync behavior in Crabbox.
+## Boundary

-## Cleanup
-
-Crabbox has coordinator-owned idle expiry and local lease claims, so OpenClaw
-does not need a custom ledger. Default idle timeout is 30 minutes unless config
-or flags set a different value. Still stop boxes you created when done.
-If `crabbox list` prints `orphan=no-active-lease`, treat it as an operator
-review hint; do not delete `keep=true` machines without checking provider and
-coordinator state.
+Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the
+hydration workflow and keep Crabbox generic around lease, sync, command
+execution, logs/results, timing, and cleanup.