From d286620c03d0d6c195f84e07d0e039db548f4564 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sun, 3 May 2026 16:35:40 +0100 Subject: [PATCH] chore(skills): fold testbox notes into crabbox --- .agents/skills/blacksmith-testbox/SKILL.md | 420 --------------------- .agents/skills/crabbox/SKILL.md | 290 +++++++++++--- 2 files changed, 243 insertions(+), 467 deletions(-) delete mode 100644 .agents/skills/blacksmith-testbox/SKILL.md diff --git a/.agents/skills/blacksmith-testbox/SKILL.md b/.agents/skills/blacksmith-testbox/SKILL.md deleted file mode 100644 index 1827bcc475e..00000000000 --- a/.agents/skills/blacksmith-testbox/SKILL.md +++ /dev/null @@ -1,420 +0,0 @@ ---- -name: blacksmith-testbox -description: Run Blacksmith Testbox for CI-parity checks, secrets, hosted services, migrations, or builds local cannot reproduce. ---- - -# Blacksmith Testbox - -## Scope - -Use Testbox when you need remote CI parity, injected secrets, hosted services, -or an OS/runtime image that your local machine cannot provide cheaply. - -For OpenClaw, Crabbox is a supported alternative when Blacksmith is unavailable -or owned cloud capacity is preferable. - -Do not default to Testbox for every local test/build loop. If the repo has -documented local commands for normal iteration, use those first so you keep -warm caches, local build state, and fast feedback. - -Testbox is the expensive path. Reach for it deliberately. - -OpenClaw maintainers can opt into Testbox-first validation by setting -`OPENCLAW_TESTBOX=1` in their environment or standing agent rules. This mode is -maintainers-only and requires Blacksmith access. - -When `OPENCLAW_TESTBOX=1` is set in OpenClaw: - -- Pre-warm a Testbox early for longer, wider, or uncertain work. -- Prefer Testbox for `pnpm` gates, e2e, package-like proof, and broad suites. -- Reuse the same Testbox ID for every run command in the same task/session. -- Use local commands only when the task explicitly sets - `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`, or when the user asks for local - proof. - -## Install the CLI - -If `blacksmith` is not installed, install it: - - curl -fsSL https://get.blacksmith.sh | sh - -For the canary channel (bleeding-edge): - - BLACKSMITH_CHANNEL=canary sh -c 'curl -fsSL https://get.blacksmith.sh | sh' - -Then authenticate: - - blacksmith auth login - -## Agent-triggered browser auth (non-interactive) - -When an agent needs to ensure the user is authenticated before running testbox -commands (e.g. warmup, run), use browser-based auth with non-interactive mode. -This opens the browser for the user to sign in; the agent does not interact with -the browser. The org selector in the dashboard is skipped, so the user only sees -the sign-in flow. - -**Required command** (`--organization` is required with `--non-interactive`): - - blacksmith auth login --non-interactive --organization - -The org slug can come from `BLACKSMITH_ORG` env var or the `--org` global flag. -If neither is set, the agent should use the project's known org (e.g. from repo -config or user context). Example: - - blacksmith auth login --non-interactive --organization acme-corp - blacksmith --org acme-corp auth login --non-interactive --organization acme-corp - -**Flow**: The CLI starts a local callback server, opens the browser to the -dashboard auth page, and blocks for up to 2 minutes. The user completes sign-in -and authorization in the browser. The dashboard redirects to localhost with the -token; the CLI saves credentials and exits. The agent then proceeds. - -**Do not use** `--api-token` for this flow — that is for headless/token-based -auth. This skill focuses on browser-based auth when the user prefers signing in -via the web UI. - -Optional flags: - -- `--dashboard-url ` — Override dashboard URL (e.g. for staging) - -## Decide first: local or Testbox - -Before warming anything up, check the repo's own instructions. - -Prefer local commands when: - -- the repo documents a supported local test/build workflow -- you are iterating on unit tests, lint, typecheck, formatting, or other - local-only validation -- the value comes from warm local caches and fast repeat runs -- the command does not need remote secrets, hosted services, or CI-only images - -Prefer Testbox when: - -- the repo explicitly requires CI-parity or remote validation -- the command needs secrets, service containers, or provisioned infra -- you are reproducing CI-only failures -- you need the exact workflow image/job environment from GitHub Actions - -For OpenClaw specifically, normal local iteration stays local unless maintainer -Testbox mode is enabled with `OPENCLAW_TESTBOX=1`: - -- `pnpm check:changed` -- `pnpm test:changed` -- `pnpm test ` -- `pnpm test:serial` -- `pnpm build` - -If `OPENCLAW_TESTBOX=1` is enabled, run those same repo commands inside the -warm Testbox. If the user wants laptop-friendly local proof for one command, use -the explicit escape hatch `OPENCLAW_LOCAL_CHECK_MODE=throttled`. - -For installable-package product proof, prefer the GitHub `Package Acceptance` -workflow over an ad hoc Testbox command. It resolves one package candidate -(`source=npm`, `source=ref`, `source=url`, or `source=artifact`), uploads it as -`package-under-test`, and runs the reusable Docker E2E lanes against that exact -tarball on GitHub/Blacksmith runners. Use `workflow_ref` for the trusted -workflow/harness code and `package_ref` for the source ref to pack when testing -an older trusted branch, tag, or SHA. - -## Setup: Warmup before coding - -If you decided Testbox is warranted, warm one up early. This returns an ID -instantly and boots the CI environment in the background while you work: - - blacksmith testbox warmup ci-check-testbox.yml - # → tbx_01jkz5b3t9... - -Save this ID in the current session. You need it for every `run` command. -Treat `blacksmith testbox list` as diagnostics, not a reusable work queue. -Listed boxes can be visible at the org/repo level while still being unusable or -stale for the current local agent lane. - -For OpenClaw maintainer Testbox mode, pre-warm at the start of longer or wider -tasks: - - blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90 - pnpm testbox:claim --id - -Use the build-artifact warmup when e2e/package/build proof benefits from seeded -`dist/`, `dist-runtime/`, and build-all caches: - - blacksmith testbox warmup ci-build-artifacts-testbox.yml --ref main --idle-timeout 90 - pnpm testbox:claim --id - -Warmup dispatches a GitHub Actions workflow that provisions a VM with the -full CI environment: dependencies installed, services started, secrets -injected, and a clean checkout of the repo at the default branch. - -In OpenClaw, raw commit SHAs are not reliable dispatch refs for `warmup --ref`; -use a branch or tag. The build-artifact workflow resolves `openclaw@beta` and -`openclaw@latest` to SHA cache keys internally. - -Options: - - --ref Git ref to dispatch against (default: repo's default branch) - --job Specific job within the workflow (if it has multiple) - --idle-timeout Idle timeout in minutes (default: 30) - -## CRITICAL: Always run from the repo root - -ALWAYS invoke `blacksmith testbox` commands from the **root of the git -repository**. The CLI syncs the current working directory to the testbox -using rsync with `--delete`. If you run from a subdirectory (e.g. -`cd backend && blacksmith testbox run ...`), rsync will mirror only that -subdirectory and **delete everything else** on the testbox — wiping other -directories like `dashboard/`, `cli/`, etc. - - # CORRECT — run from repo root, use paths in the command - blacksmith testbox run --id "cd backend && php artisan test" - blacksmith testbox run --id "cd dashboard && npm test" - - # WRONG — do NOT cd into a subdirectory before invoking the CLI - cd backend && blacksmith testbox run --id "php artisan test" - -If your shell is in a subdirectory, `cd` back to the repo root first: - - cd "$(git rev-parse --show-toplevel)" - blacksmith testbox run --id "cd backend && php artisan test" - -## Running commands - - blacksmith testbox run --id "" - -The `run` command automatically waits for the testbox to become ready if -it is still booting, so you can call `run` immediately after warmup without -needing to check status first. - -In OpenClaw, prefer the guarded runner wrapper so stale/reused ids fail before -the Blacksmith CLI spends time syncing or emits a confusing missing-key error: - - pnpm testbox:run --id -- "OPENCLAW_TESTBOX=1 pnpm check:changed" - -The wrapper refuses to run when the local per-Testbox key is missing or when the -id was not claimed by this OpenClaw checkout with `pnpm testbox:claim --id -`. Treat that as the expected remediation, not as a GitHub account or -normal SSH-key problem. A local key alone is not enough; a ready box may still -carry stale rsync state from another lane. - -If the agent crashes, the remote box relies on Blacksmith's idle timeout. The -local OpenClaw claim marker is not deleted automatically, so the wrapper treats -claims older than 12 hours as stale. Override only for intentional long-running -work with `OPENCLAW_TESTBOX_CLAIM_TTL_MINUTES=`. - -Before spending a broad gate on a manually assembled command, you can also run: - - pnpm testbox:sanity -- --id - -## Downloading files from a testbox - -Use the `download` command to retrieve files or directories from a running -testbox to your local machine. This is useful for fetching build artifacts, -test results, coverage reports, or any output generated on the testbox. - - blacksmith testbox download --id [local-path] - -The remote path is relative to the testbox working directory (same as `run`). -If no local path is specified, the file is saved to the current directory -using the same base name. - -To download a directory, append a trailing `/` to the remote path — this -triggers recursive mode: - - # Download a single file - blacksmith testbox download --id coverage/report.html - - # Download a file to a specific local path - blacksmith testbox download --id build/output.tar.gz ./output.tar.gz - - # Download an entire directory - blacksmith testbox download --id test-results/ ./results/ - -Options: - - --ssh-private-key Path to SSH private key (if warmup used --ssh-public-key) - -## How file sync works - -Understanding this model is critical for using Testbox correctly. - -When you call `run`, the CLI performs a **delta sync** of your local changes -to the remote testbox before executing your command: - -1. The testbox VM starts from a clean `actions/checkout` at the warmup ref. - The workflow's setup steps (e.g. `npm install`, `pip install`, `composer install`) - run during warmup and populate dependency directories on the remote VM. - -2. On each `run`, the CLI uses **git** to detect which files changed locally - since the last sync. It syncs ONLY tracked files and untracked non-ignored - files (i.e. files that `git ls-files` reports). - -3. **`.gitignore`'d directories are never synced.** This means directories - like `node_modules/`, `vendor/`, `.venv/`, `build/`, `dist/`, etc. are - NOT transferred from your local machine. The testbox uses its own copies - of those directories, populated during the warmup workflow steps. - -4. If nothing has changed since the last sync (same git commit and working - tree state), the sync is skipped entirely for speed. - -### Why this matters - -- **Changing dependencies**: If you modify `package.json`, `requirements.txt`, - `composer.json`, `go.mod`, or similar dependency manifests, the lock/manifest - file will be synced but the actual dependency directory will NOT. You must - re-run the install command on the testbox: - - blacksmith testbox run --id "npm install && npm test" - blacksmith testbox run --id "pip install -r requirements.txt && pytest" - blacksmith testbox run --id "composer install && phpunit" - -- **Generated/build artifacts**: If your tests depend on a build step (e.g. - `npm run build`, `make`), and you changed source files that affect the build - output, re-run the build on the testbox before testing. - -- **New untracked files**: New files you create locally ARE synced (as long as - they are not gitignored). You do not need to `git add` them first. - -- **Deleted files**: Files you delete locally are also deleted on the remote - testbox. The sync model keeps the remote in lockstep with your local managed - file set. - -## CRITICAL: Do not ban local tests - -Do not assume local validation is forbidden. Many repos intentionally invest in -fast, warm local loops, and forcing every run through Testbox destroys that -advantage. - -Use Testbox for the checks that actually need it: remote parity, secrets, -services, CI-only runners, or reproducibility against the workflow image. - -If the repo says local tests/builds are the normal path, follow the repo. - -OpenClaw maintainer exception: if `OPENCLAW_TESTBOX=1` is set by the user or -agent environment, treat Testbox as the normal validation path for this repo. -Use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` as the explicit local escape -hatch. - -## When to use - -Use Testbox when: - -- running database migrations or destructive environment checks -- running commands that depend on secrets or environment variables not present locally -- reproducing CI-only failures or validating against the workflow image -- validating behavior that needs provisioned services or remote runners -- doing a final parity check before commit/push when the repo or user wants that - -Trim that list based on repo guidance. If the repo documents supported local -tests/builds, prefer local for routine iteration and keep Testbox for the -checks that need parity or remote state. - -## Workflow - -1. Decide whether the repo's local loop is the right default. For OpenClaw, - `OPENCLAW_TESTBOX=1` makes Testbox the maintainer default. -2. If Testbox is warranted, warm up early: - `blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90` → save the ID, - then `pnpm testbox:claim --id ` -3. Write code while the testbox boots in the background. -4. Run the remote command when needed: - `pnpm testbox:run --id -- "OPENCLAW_TESTBOX=1 pnpm check:changed"` -5. If tests fail, fix code and re-run against the same warm box. -6. If you changed dependency manifests (package.json, etc.), prepend - the install command: `blacksmith testbox run --id "npm install && npm test"` -7. If a narrow PR reports a full sync or the box was reused/expired, sanity - check the remote copy before a slow gate: - `pnpm testbox:run --id -- "pnpm testbox:sanity"`. - If it reports missing root files or mass tracked deletions, stop the box and - warm a fresh one. Use `OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1` only for an - intentional large deletion PR. -8. If you need artifacts (coverage reports, build outputs, etc.), download them: - `blacksmith testbox download --id coverage/ ./coverage/` -9. Once green, commit and push. - -## OpenClaw full test suite - -For OpenClaw, use the repo package manager and the measured stable full-suite -profile below. It keeps six Vitest project shards active while limiting each -shard to one worker to avoid worker OOMs on Testbox: - - blacksmith testbox run --id "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test" - -Observed full-suite time on Blacksmith Testbox is about 3-4 minutes: - -- 173-180s on a warmed box -- 219s on a fresh 32-vCPU box - -When validating before commit/push in maintainer Testbox mode, run -`pnpm check:changed` inside the warmed box first when appropriate, then the full -suite with the profile above if broad confidence is needed. - -Run `pnpm testbox:sanity` inside the warmed box before the broad command when -the sync looks suspicious. It checks that root files such as `pnpm-lock.yaml` -still exist and fails on 200 or more tracked deletions. That catches stale or -corrupted rsync state before dependency install or Vitest failures hide the real -problem. - -## Examples - - blacksmith testbox warmup ci-check-testbox.yml - # → tbx_01jkz5b3t9... - - # Run tests - blacksmith testbox run --id "npm test -- --testPathPattern=handler.test" - blacksmith testbox run --id "go test ./pkg/api/... -run TestHandler -v" - blacksmith testbox run --id "python -m pytest tests/test_api.py -k test_auth" - - # Re-install deps after changing package.json, then test - blacksmith testbox run --id "npm install && npm test" - - # Build and test - blacksmith testbox run --id "npm run build && npm test" - - # Download artifacts from the testbox - blacksmith testbox download --id coverage/lcov-report/ ./coverage/ - blacksmith testbox download --id build/output.tar.gz - -## Waiting for the testbox to be ready - -The `run` command automatically waits for the testbox, so explicit waiting is -usually unnecessary. If you do need to check readiness separately (e.g. before -a series of runs), use the `--wait` flag. Do NOT use a sleep-and-recheck loop. - -Correct: block until ready with a timeout: - - blacksmith testbox status --id --wait [--wait-timeout 5m] - -Wrong: never use sleep + status in a loop: - - # BAD — do not do this - sleep 30 && blacksmith testbox status --id - while ! blacksmith testbox status --id | grep ready; do sleep 5; done - -`--wait` polls the status and exits as soon as the testbox is ready (or when the -timeout is reached). Default timeout is 5m; use `--wait-timeout` for longer -(e.g. `10m`, `1h`). - -## Managing testboxes - - # Check status of a specific testbox - blacksmith testbox status --id - - # List all active testboxes for the current repo - blacksmith testbox list - - # Stop a testbox when you're done (frees resources) - blacksmith testbox stop --id - -Testboxes automatically shut down after being idle (default: 30 minutes). -If you need a longer session, increase the timeout at warmup time. For OpenClaw -maintainer work, use 90 minutes for long-running sessions: - - blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90 - blacksmith testbox warmup ci-build-artifacts-testbox.yml --idle-timeout 90 - -## With options - - blacksmith testbox warmup ci-check-testbox.yml --ref main - blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90 - blacksmith testbox run --id "go test ./..." diff --git a/.agents/skills/crabbox/SKILL.md b/.agents/skills/crabbox/SKILL.md index fe2419ecc8c..0950e625879 100644 --- a/.agents/skills/crabbox/SKILL.md +++ b/.agents/skills/crabbox/SKILL.md @@ -1,54 +1,249 @@ --- name: crabbox -description: Use Crabbox for OpenClaw remote Linux validation, warmed reusable boxes, GitHub Actions hydration, sync timing, logs, results, caches, and lease cleanup. +description: Use Crabbox for OpenClaw remote Linux validation. Default to Blacksmith Testbox; includes direct Blacksmith and owned AWS/Hetzner fallback notes when Crabbox fails. --- # Crabbox -Use Crabbox when OpenClaw needs remote Linux proof on owned capacity, a large -runner class, reusable warm state, or a Blacksmith alternative. +Use Crabbox when OpenClaw needs remote Linux proof for broad tests, CI-parity +checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable +boxes, sync timing, logs/results, cache inspection, or lease cleanup. -## Before Running +Default backend: `blacksmith-testbox`. The separate `blacksmith-testbox` skill +has been removed; this skill owns both the normal Crabbox path and the direct +Blacksmith fallback playbook. + +## First Checks - Run from the repo root. Crabbox sync mirrors the current checkout. -- Prefer local targeted tests for tight edit loops. -- Prefer Blacksmith Testbox when the task explicitly asks for Blacksmith or a - Blacksmith-specific CI comparison. -- Use Crabbox for broad OpenClaw gates when owned AWS/Hetzner capacity is the - right remote lane. -- Check `.crabbox.yaml` for repo defaults before adding flags. -- Sanity-check the selected binary before remote work. OpenClaw scripts prefer - `../crabbox/bin/crabbox` when present; the user PATH shim can be stale: - `command -v crabbox; ../crabbox/bin/crabbox --version; ../crabbox/bin/crabbox --help | sed -n '1,90p'`. -- Install with `brew install openclaw/tap/crabbox`; auth is required before use: - `printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`. -- On macOS the user config is `~/Library/Application Support/crabbox/config.yaml`; - it must include `broker.url`, `broker.token`, and usually `provider: aws`. - -## OpenClaw Flow - -AWS/owned-capacity flow for `pnpm` tests: +- Check the wrapper and providers before remote work: ```sh -pnpm crabbox:warmup -- --idle-timeout 90m -pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m -pnpm crabbox:hydrate -- --id -pnpm crabbox:run -- --id --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +command -v crabbox +../crabbox/bin/crabbox --version +pnpm crabbox:run -- --help | sed -n '1,120p' ``` -Blacksmith-backed Crabbox flow can delegate setup to the Testbox workflow: +- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH + shim can be stale. +- Check `.crabbox.yaml` for repo defaults, but override provider explicitly. + Even if config still says AWS, maintainer validation should normally pass + `--provider blacksmith-testbox`. +- Prefer local targeted tests for tight edit loops. Broad gates belong remote. + +## Default Blacksmith Backend + +Use this for `pnpm check`, `pnpm check:changed`, `pnpm test`, +`pnpm test:changed`, Docker/E2E/live/package gates, or anything likely to fan +out across many Vitest projects. + +Changed gate: ```sh -pnpm crabbox:run -- --provider blacksmith-testbox --blacksmith-org openclaw --blacksmith-workflow .github/workflows/ci-check-testbox.yml --blacksmith-job check --blacksmith-ref main --idle-timeout 90m --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +pnpm crabbox:run -- --provider blacksmith-testbox \ + --blacksmith-org openclaw \ + --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ + --blacksmith-job check \ + --blacksmith-ref main \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +``` + +Full suite: + +```sh +pnpm crabbox:run -- --provider blacksmith-testbox \ + --blacksmith-org openclaw \ + --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ + --blacksmith-job check \ + --blacksmith-ref main \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test" +``` + +Focused rerun: + +```sh +pnpm crabbox:run -- --provider blacksmith-testbox \ + --blacksmith-org openclaw \ + --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ + --blacksmith-job check \ + --blacksmith-ref main \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test " +``` + +Read the JSON summary. Useful fields: + +- `provider`: should be `blacksmith-testbox` +- `leaseId`: `tbx_...` +- `syncDelegated`: should be `true` +- `commandMs` / `totalMs` +- `exitCode` + +Crabbox should stop one-shot Blacksmith Testboxes automatically after the run. +Verify cleanup when a run fails, is interrupted, or the command output is +unclear: + +```sh +blacksmith testbox list +``` + +## Reuse And Keepalive + +For most Blacksmith-backed Crabbox calls, one-shot is enough. Use reuse only +when you need multiple manual commands on the same hydrated box. + +If Crabbox returns a reusable id or you intentionally keep a lease: + +```sh +pnpm crabbox:run -- --provider blacksmith-testbox --id --no-sync --timing-json --shell -- "pnpm test " ``` Stop boxes you created before handoff: ```sh +pnpm crabbox:stop -- +blacksmith testbox stop --id +``` + +## If Crabbox Fails + +Keep the fallback narrow. First decide whether the failure is Crabbox itself, +Blacksmith/Testbox, repo hydration, sync, or the test command. + +Fast checks: + +```sh +command -v crabbox +../crabbox/bin/crabbox --version +crabbox run --provider blacksmith-testbox --help | sed -n '1,140p' +command -v blacksmith +blacksmith --version +blacksmith testbox list +``` + +Common Crabbox-only failures: + +- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling + repo, or update/install Crabbox before retrying. +- Bad local config: pass `--provider blacksmith-testbox` plus explicit + `--blacksmith-*` flags instead of relying on `.crabbox.yaml`. +- Slug/claim confusion: use the raw `tbx_...` id, or run one-shot without + `--id`. +- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the + printed Actions URL. +- Cleanup uncertainty: run `blacksmith testbox list` and stop only boxes you + created. + +If Crabbox cannot dispatch, sync, attach, or stop but Blacksmith itself works, +use direct Blacksmith from the repo root: + +```sh +blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90 +blacksmith testbox run --id "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +blacksmith testbox stop --id +``` + +Direct full suite: + +```sh +blacksmith testbox run --id "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test" +``` + +Auth fallback, only when `blacksmith` says auth is missing: + +```sh +blacksmith auth login --non-interactive --organization openclaw +``` + +Raw Blacksmith footguns: + +- Run from repo root. The CLI syncs the current directory. +- Save the returned `tbx_...` id in the session. +- Reuse that id for focused reruns; stop it before handoff. +- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag. +- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable + queue. + +Escalate to owned AWS/Hetzner only when Blacksmith is down, quota-limited, +missing the needed environment, or owned capacity is the explicit goal. Use the +Owned Cloud Fallback section below. + +## Blacksmith Backend Notes + +Crabbox Blacksmith backend delegates setup to: + +- org: `openclaw` +- workflow: `.github/workflows/ci-check-testbox.yml` +- job: `check` +- ref: `main` unless testing a branch/tag intentionally + +The hydration workflow owns checkout, Node/pnpm setup, dependency install, +secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command +execution, timing, logs/results, and cleanup. + +Minimal direct Blacksmith fallback, from repo root: + +```sh +blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90 +blacksmith testbox run --id "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed" +blacksmith testbox stop --id +``` + +Use direct Blacksmith only when Crabbox is the broken layer and Blacksmith +itself still works. Prefer direct `blacksmith testbox list` for cleanup +diagnostics, not as a reusable work queue. + +Important Blacksmith footguns: + +- Always run from repo root. The CLI syncs the current directory. +- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag. +- If auth is missing and browser auth is acceptable: + +```sh +blacksmith auth login --non-interactive --organization openclaw +``` + +## Owned Cloud Fallback + +Use AWS/Hetzner only when Blacksmith is down, quota-limited, missing the needed +environment, or owned capacity is explicitly the goal. + +```sh +pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m +pnpm crabbox:hydrate -- --id +pnpm crabbox:run -- --id --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" pnpm crabbox:stop -- ``` -## Useful Commands +Install/auth for owned Crabbox if needed: + +```sh +brew install openclaw/tap/crabbox +printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin +``` + +macOS config lives at: + +```text +~/Library/Application Support/crabbox/config.yaml +``` + +It should include `broker.url`, `broker.token`, and usually `provider: aws` +for owned-cloud lanes. Do not let that config override the OpenClaw default +when Blacksmith proof is requested; pass `--provider blacksmith-testbox`. + +## Diagnostics ```sh crabbox status --id --wait @@ -59,29 +254,30 @@ crabbox logs crabbox results crabbox cache stats --id crabbox ssh --id +blacksmith testbox list ``` Use `--debug` on `run` when measuring sync timing. -Use `--timing-json` on warmup, hydrate, and run when comparing AWS and -blacksmith-testbox timings. -Use `--market spot|on-demand` on AWS warmup or one-shot run when testing quota -or capacity behavior without changing `.crabbox.yaml`. +Use `--timing-json` on warmup, hydrate, and run when comparing backends. +Use `--market spot|on-demand` only on AWS warmup/one-shot runs. -## Hydration Boundary +## Failure Triage -`.github/workflows/crabbox-hydrate.yml` is repo-specific on purpose. It owns -OpenClaw checkout, setup-node, pnpm setup, provider env hydration, ready marker, -and keepalive. Crabbox owns runner registration, workflow dispatch, SSH sync, -command execution, logs/results, local lease claims, and idle cleanup. +- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists + `blacksmith-testbox`; update Crabbox before falling back. +- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect + the hydration step. +- Sync failed: rerun with `--debug`; check changed-file count and whether the + checkout is dirty. +- Command failed: rerun only the failing shard/file first. Do not rerun a full + suite until the focused failure is understood. +- Cleanup uncertain: `blacksmith testbox list`; stop owned `tbx_...` leases you + created. +- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above, + then file/fix the Crabbox issue. -Do not add OpenClaw-specific setup to Crabbox. Put repo setup in the hydration -workflow and generic lease/sync behavior in Crabbox. +## Boundary -## Cleanup - -Crabbox has coordinator-owned idle expiry and local lease claims, so OpenClaw -does not need a custom ledger. Default idle timeout is 30 minutes unless config -or flags set a different value. Still stop boxes you created when done. -If `crabbox list` prints `orphan=no-active-lease`, treat it as an operator -review hint; do not delete `keep=true` machines without checking provider and -coordinator state. +Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the +hydration workflow and keep Crabbox generic around lease, sync, command +execution, logs/results, timing, and cleanup.