mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 10:30:44 +00:00
chore(skills): fold testbox notes into crabbox
This commit is contained in:
@@ -1,420 +0,0 @@
|
||||
---
|
||||
name: blacksmith-testbox
|
||||
description: Run Blacksmith Testbox for CI-parity checks, secrets, hosted services, migrations, or builds local cannot reproduce.
|
||||
---
|
||||
|
||||
# Blacksmith Testbox
|
||||
|
||||
## Scope
|
||||
|
||||
Use Testbox when you need remote CI parity, injected secrets, hosted services,
|
||||
or an OS/runtime image that your local machine cannot provide cheaply.
|
||||
|
||||
For OpenClaw, Crabbox is a supported alternative when Blacksmith is unavailable
|
||||
or owned cloud capacity is preferable.
|
||||
|
||||
Do not default to Testbox for every local test/build loop. If the repo has
|
||||
documented local commands for normal iteration, use those first so you keep
|
||||
warm caches, local build state, and fast feedback.
|
||||
|
||||
Testbox is the expensive path. Reach for it deliberately.
|
||||
|
||||
OpenClaw maintainers can opt into Testbox-first validation by setting
|
||||
`OPENCLAW_TESTBOX=1` in their environment or standing agent rules. This mode is
|
||||
maintainers-only and requires Blacksmith access.
|
||||
|
||||
When `OPENCLAW_TESTBOX=1` is set in OpenClaw:
|
||||
|
||||
- Pre-warm a Testbox early for longer, wider, or uncertain work.
|
||||
- Prefer Testbox for `pnpm` gates, e2e, package-like proof, and broad suites.
|
||||
- Reuse the same Testbox ID for every run command in the same task/session.
|
||||
- Use local commands only when the task explicitly sets
|
||||
`OPENCLAW_LOCAL_CHECK_MODE=throttled|full`, or when the user asks for local
|
||||
proof.
|
||||
|
||||
## Install the CLI
|
||||
|
||||
If `blacksmith` is not installed, install it:
|
||||
|
||||
curl -fsSL https://get.blacksmith.sh | sh
|
||||
|
||||
For the canary channel (bleeding-edge):
|
||||
|
||||
BLACKSMITH_CHANNEL=canary sh -c 'curl -fsSL https://get.blacksmith.sh | sh'
|
||||
|
||||
Then authenticate:
|
||||
|
||||
blacksmith auth login
|
||||
|
||||
## Agent-triggered browser auth (non-interactive)
|
||||
|
||||
When an agent needs to ensure the user is authenticated before running testbox
|
||||
commands (e.g. warmup, run), use browser-based auth with non-interactive mode.
|
||||
This opens the browser for the user to sign in; the agent does not interact with
|
||||
the browser. The org selector in the dashboard is skipped, so the user only sees
|
||||
the sign-in flow.
|
||||
|
||||
**Required command** (`--organization` is required with `--non-interactive`):
|
||||
|
||||
blacksmith auth login --non-interactive --organization <org-slug>
|
||||
|
||||
The org slug can come from `BLACKSMITH_ORG` env var or the `--org` global flag.
|
||||
If neither is set, the agent should use the project's known org (e.g. from repo
|
||||
config or user context). Example:
|
||||
|
||||
blacksmith auth login --non-interactive --organization acme-corp
|
||||
blacksmith --org acme-corp auth login --non-interactive --organization acme-corp
|
||||
|
||||
**Flow**: The CLI starts a local callback server, opens the browser to the
|
||||
dashboard auth page, and blocks for up to 2 minutes. The user completes sign-in
|
||||
and authorization in the browser. The dashboard redirects to localhost with the
|
||||
token; the CLI saves credentials and exits. The agent then proceeds.
|
||||
|
||||
**Do not use** `--api-token` for this flow — that is for headless/token-based
|
||||
auth. This skill focuses on browser-based auth when the user prefers signing in
|
||||
via the web UI.
|
||||
|
||||
Optional flags:
|
||||
|
||||
- `--dashboard-url <url>` — Override dashboard URL (e.g. for staging)
|
||||
|
||||
## Decide first: local or Testbox
|
||||
|
||||
Before warming anything up, check the repo's own instructions.
|
||||
|
||||
Prefer local commands when:
|
||||
|
||||
- the repo documents a supported local test/build workflow
|
||||
- you are iterating on unit tests, lint, typecheck, formatting, or other
|
||||
local-only validation
|
||||
- the value comes from warm local caches and fast repeat runs
|
||||
- the command does not need remote secrets, hosted services, or CI-only images
|
||||
|
||||
Prefer Testbox when:
|
||||
|
||||
- the repo explicitly requires CI-parity or remote validation
|
||||
- the command needs secrets, service containers, or provisioned infra
|
||||
- you are reproducing CI-only failures
|
||||
- you need the exact workflow image/job environment from GitHub Actions
|
||||
|
||||
For OpenClaw specifically, normal local iteration stays local unless maintainer
|
||||
Testbox mode is enabled with `OPENCLAW_TESTBOX=1`:
|
||||
|
||||
- `pnpm check:changed`
|
||||
- `pnpm test:changed`
|
||||
- `pnpm test <path-or-filter>`
|
||||
- `pnpm test:serial`
|
||||
- `pnpm build`
|
||||
|
||||
If `OPENCLAW_TESTBOX=1` is enabled, run those same repo commands inside the
|
||||
warm Testbox. If the user wants laptop-friendly local proof for one command, use
|
||||
the explicit escape hatch `OPENCLAW_LOCAL_CHECK_MODE=throttled`.
|
||||
|
||||
For installable-package product proof, prefer the GitHub `Package Acceptance`
|
||||
workflow over an ad hoc Testbox command. It resolves one package candidate
|
||||
(`source=npm`, `source=ref`, `source=url`, or `source=artifact`), uploads it as
|
||||
`package-under-test`, and runs the reusable Docker E2E lanes against that exact
|
||||
tarball on GitHub/Blacksmith runners. Use `workflow_ref` for the trusted
|
||||
workflow/harness code and `package_ref` for the source ref to pack when testing
|
||||
an older trusted branch, tag, or SHA.
|
||||
|
||||
## Setup: Warmup before coding
|
||||
|
||||
If you decided Testbox is warranted, warm one up early. This returns an ID
|
||||
instantly and boots the CI environment in the background while you work:
|
||||
|
||||
blacksmith testbox warmup ci-check-testbox.yml
|
||||
# → tbx_01jkz5b3t9...
|
||||
|
||||
Save this ID in the current session. You need it for every `run` command.
|
||||
Treat `blacksmith testbox list` as diagnostics, not a reusable work queue.
|
||||
Listed boxes can be visible at the org/repo level while still being unusable or
|
||||
stale for the current local agent lane.
|
||||
|
||||
For OpenClaw maintainer Testbox mode, pre-warm at the start of longer or wider
|
||||
tasks:
|
||||
|
||||
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
|
||||
pnpm testbox:claim --id <ID>
|
||||
|
||||
Use the build-artifact warmup when e2e/package/build proof benefits from seeded
|
||||
`dist/`, `dist-runtime/`, and build-all caches:
|
||||
|
||||
blacksmith testbox warmup ci-build-artifacts-testbox.yml --ref main --idle-timeout 90
|
||||
pnpm testbox:claim --id <ID>
|
||||
|
||||
Warmup dispatches a GitHub Actions workflow that provisions a VM with the
|
||||
full CI environment: dependencies installed, services started, secrets
|
||||
injected, and a clean checkout of the repo at the default branch.
|
||||
|
||||
In OpenClaw, raw commit SHAs are not reliable dispatch refs for `warmup --ref`;
|
||||
use a branch or tag. The build-artifact workflow resolves `openclaw@beta` and
|
||||
`openclaw@latest` to SHA cache keys internally.
|
||||
|
||||
Options:
|
||||
|
||||
--ref <branch|tag> Git ref to dispatch against (default: repo's default branch)
|
||||
--job <name> Specific job within the workflow (if it has multiple)
|
||||
--idle-timeout <min> Idle timeout in minutes (default: 30)
|
||||
|
||||
## CRITICAL: Always run from the repo root
|
||||
|
||||
ALWAYS invoke `blacksmith testbox` commands from the **root of the git
|
||||
repository**. The CLI syncs the current working directory to the testbox
|
||||
using rsync with `--delete`. If you run from a subdirectory (e.g.
|
||||
`cd backend && blacksmith testbox run ...`), rsync will mirror only that
|
||||
subdirectory and **delete everything else** on the testbox — wiping other
|
||||
directories like `dashboard/`, `cli/`, etc.
|
||||
|
||||
# CORRECT — run from repo root, use paths in the command
|
||||
blacksmith testbox run --id <ID> "cd backend && php artisan test"
|
||||
blacksmith testbox run --id <ID> "cd dashboard && npm test"
|
||||
|
||||
# WRONG — do NOT cd into a subdirectory before invoking the CLI
|
||||
cd backend && blacksmith testbox run --id <ID> "php artisan test"
|
||||
|
||||
If your shell is in a subdirectory, `cd` back to the repo root first:
|
||||
|
||||
cd "$(git rev-parse --show-toplevel)"
|
||||
blacksmith testbox run --id <ID> "cd backend && php artisan test"
|
||||
|
||||
## Running commands
|
||||
|
||||
blacksmith testbox run --id <ID> "<command>"
|
||||
|
||||
The `run` command automatically waits for the testbox to become ready if
|
||||
it is still booting, so you can call `run` immediately after warmup without
|
||||
needing to check status first.
|
||||
|
||||
In OpenClaw, prefer the guarded runner wrapper so stale/reused ids fail before
|
||||
the Blacksmith CLI spends time syncing or emits a confusing missing-key error:
|
||||
|
||||
pnpm testbox:run --id <ID> -- "OPENCLAW_TESTBOX=1 pnpm check:changed"
|
||||
|
||||
The wrapper refuses to run when the local per-Testbox key is missing or when the
|
||||
id was not claimed by this OpenClaw checkout with `pnpm testbox:claim --id
|
||||
<ID>`. Treat that as the expected remediation, not as a GitHub account or
|
||||
normal SSH-key problem. A local key alone is not enough; a ready box may still
|
||||
carry stale rsync state from another lane.
|
||||
|
||||
If the agent crashes, the remote box relies on Blacksmith's idle timeout. The
|
||||
local OpenClaw claim marker is not deleted automatically, so the wrapper treats
|
||||
claims older than 12 hours as stale. Override only for intentional long-running
|
||||
work with `OPENCLAW_TESTBOX_CLAIM_TTL_MINUTES=<minutes>`.
|
||||
|
||||
Before spending a broad gate on a manually assembled command, you can also run:
|
||||
|
||||
pnpm testbox:sanity -- --id <ID>
|
||||
|
||||
## Downloading files from a testbox
|
||||
|
||||
Use the `download` command to retrieve files or directories from a running
|
||||
testbox to your local machine. This is useful for fetching build artifacts,
|
||||
test results, coverage reports, or any output generated on the testbox.
|
||||
|
||||
blacksmith testbox download --id <ID> <remote-path> [local-path]
|
||||
|
||||
The remote path is relative to the testbox working directory (same as `run`).
|
||||
If no local path is specified, the file is saved to the current directory
|
||||
using the same base name.
|
||||
|
||||
To download a directory, append a trailing `/` to the remote path — this
|
||||
triggers recursive mode:
|
||||
|
||||
# Download a single file
|
||||
blacksmith testbox download --id <ID> coverage/report.html
|
||||
|
||||
# Download a file to a specific local path
|
||||
blacksmith testbox download --id <ID> build/output.tar.gz ./output.tar.gz
|
||||
|
||||
# Download an entire directory
|
||||
blacksmith testbox download --id <ID> test-results/ ./results/
|
||||
|
||||
Options:
|
||||
|
||||
--ssh-private-key <path> Path to SSH private key (if warmup used --ssh-public-key)
|
||||
|
||||
## How file sync works
|
||||
|
||||
Understanding this model is critical for using Testbox correctly.
|
||||
|
||||
When you call `run`, the CLI performs a **delta sync** of your local changes
|
||||
to the remote testbox before executing your command:
|
||||
|
||||
1. The testbox VM starts from a clean `actions/checkout` at the warmup ref.
|
||||
The workflow's setup steps (e.g. `npm install`, `pip install`, `composer install`)
|
||||
run during warmup and populate dependency directories on the remote VM.
|
||||
|
||||
2. On each `run`, the CLI uses **git** to detect which files changed locally
|
||||
since the last sync. It syncs ONLY tracked files and untracked non-ignored
|
||||
files (i.e. files that `git ls-files` reports).
|
||||
|
||||
3. **`.gitignore`'d directories are never synced.** This means directories
|
||||
like `node_modules/`, `vendor/`, `.venv/`, `build/`, `dist/`, etc. are
|
||||
NOT transferred from your local machine. The testbox uses its own copies
|
||||
of those directories, populated during the warmup workflow steps.
|
||||
|
||||
4. If nothing has changed since the last sync (same git commit and working
|
||||
tree state), the sync is skipped entirely for speed.
|
||||
|
||||
### Why this matters
|
||||
|
||||
- **Changing dependencies**: If you modify `package.json`, `requirements.txt`,
|
||||
`composer.json`, `go.mod`, or similar dependency manifests, the lock/manifest
|
||||
file will be synced but the actual dependency directory will NOT. You must
|
||||
re-run the install command on the testbox:
|
||||
|
||||
blacksmith testbox run --id <ID> "npm install && npm test"
|
||||
blacksmith testbox run --id <ID> "pip install -r requirements.txt && pytest"
|
||||
blacksmith testbox run --id <ID> "composer install && phpunit"
|
||||
|
||||
- **Generated/build artifacts**: If your tests depend on a build step (e.g.
|
||||
`npm run build`, `make`), and you changed source files that affect the build
|
||||
output, re-run the build on the testbox before testing.
|
||||
|
||||
- **New untracked files**: New files you create locally ARE synced (as long as
|
||||
they are not gitignored). You do not need to `git add` them first.
|
||||
|
||||
- **Deleted files**: Files you delete locally are also deleted on the remote
|
||||
testbox. The sync model keeps the remote in lockstep with your local managed
|
||||
file set.
|
||||
|
||||
## CRITICAL: Do not ban local tests
|
||||
|
||||
Do not assume local validation is forbidden. Many repos intentionally invest in
|
||||
fast, warm local loops, and forcing every run through Testbox destroys that
|
||||
advantage.
|
||||
|
||||
Use Testbox for the checks that actually need it: remote parity, secrets,
|
||||
services, CI-only runners, or reproducibility against the workflow image.
|
||||
|
||||
If the repo says local tests/builds are the normal path, follow the repo.
|
||||
|
||||
OpenClaw maintainer exception: if `OPENCLAW_TESTBOX=1` is set by the user or
|
||||
agent environment, treat Testbox as the normal validation path for this repo.
|
||||
Use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` as the explicit local escape
|
||||
hatch.
|
||||
|
||||
## When to use
|
||||
|
||||
Use Testbox when:
|
||||
|
||||
- running database migrations or destructive environment checks
|
||||
- running commands that depend on secrets or environment variables not present locally
|
||||
- reproducing CI-only failures or validating against the workflow image
|
||||
- validating behavior that needs provisioned services or remote runners
|
||||
- doing a final parity check before commit/push when the repo or user wants that
|
||||
|
||||
Trim that list based on repo guidance. If the repo documents supported local
|
||||
tests/builds, prefer local for routine iteration and keep Testbox for the
|
||||
checks that need parity or remote state.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Decide whether the repo's local loop is the right default. For OpenClaw,
|
||||
`OPENCLAW_TESTBOX=1` makes Testbox the maintainer default.
|
||||
2. If Testbox is warranted, warm up early:
|
||||
`blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90` → save the ID,
|
||||
then `pnpm testbox:claim --id <ID>`
|
||||
3. Write code while the testbox boots in the background.
|
||||
4. Run the remote command when needed:
|
||||
`pnpm testbox:run --id <ID> -- "OPENCLAW_TESTBOX=1 pnpm check:changed"`
|
||||
5. If tests fail, fix code and re-run against the same warm box.
|
||||
6. If you changed dependency manifests (package.json, etc.), prepend
|
||||
the install command: `blacksmith testbox run --id <ID> "npm install && npm test"`
|
||||
7. If a narrow PR reports a full sync or the box was reused/expired, sanity
|
||||
check the remote copy before a slow gate:
|
||||
`pnpm testbox:run --id <ID> -- "pnpm testbox:sanity"`.
|
||||
If it reports missing root files or mass tracked deletions, stop the box and
|
||||
warm a fresh one. Use `OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1` only for an
|
||||
intentional large deletion PR.
|
||||
8. If you need artifacts (coverage reports, build outputs, etc.), download them:
|
||||
`blacksmith testbox download --id <ID> coverage/ ./coverage/`
|
||||
9. Once green, commit and push.
|
||||
|
||||
## OpenClaw full test suite
|
||||
|
||||
For OpenClaw, use the repo package manager and the measured stable full-suite
|
||||
profile below. It keeps six Vitest project shards active while limiting each
|
||||
shard to one worker to avoid worker OOMs on Testbox:
|
||||
|
||||
blacksmith testbox run --id <ID> "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test"
|
||||
|
||||
Observed full-suite time on Blacksmith Testbox is about 3-4 minutes:
|
||||
|
||||
- 173-180s on a warmed box
|
||||
- 219s on a fresh 32-vCPU box
|
||||
|
||||
When validating before commit/push in maintainer Testbox mode, run
|
||||
`pnpm check:changed` inside the warmed box first when appropriate, then the full
|
||||
suite with the profile above if broad confidence is needed.
|
||||
|
||||
Run `pnpm testbox:sanity` inside the warmed box before the broad command when
|
||||
the sync looks suspicious. It checks that root files such as `pnpm-lock.yaml`
|
||||
still exist and fails on 200 or more tracked deletions. That catches stale or
|
||||
corrupted rsync state before dependency install or Vitest failures hide the real
|
||||
problem.
|
||||
|
||||
## Examples
|
||||
|
||||
blacksmith testbox warmup ci-check-testbox.yml
|
||||
# → tbx_01jkz5b3t9...
|
||||
|
||||
# Run tests
|
||||
blacksmith testbox run --id <ID> "npm test -- --testPathPattern=handler.test"
|
||||
blacksmith testbox run --id <ID> "go test ./pkg/api/... -run TestHandler -v"
|
||||
blacksmith testbox run --id <ID> "python -m pytest tests/test_api.py -k test_auth"
|
||||
|
||||
# Re-install deps after changing package.json, then test
|
||||
blacksmith testbox run --id <ID> "npm install && npm test"
|
||||
|
||||
# Build and test
|
||||
blacksmith testbox run --id <ID> "npm run build && npm test"
|
||||
|
||||
# Download artifacts from the testbox
|
||||
blacksmith testbox download --id <ID> coverage/lcov-report/ ./coverage/
|
||||
blacksmith testbox download --id <ID> build/output.tar.gz
|
||||
|
||||
## Waiting for the testbox to be ready
|
||||
|
||||
The `run` command automatically waits for the testbox, so explicit waiting is
|
||||
usually unnecessary. If you do need to check readiness separately (e.g. before
|
||||
a series of runs), use the `--wait` flag. Do NOT use a sleep-and-recheck loop.
|
||||
|
||||
Correct: block until ready with a timeout:
|
||||
|
||||
blacksmith testbox status --id <ID> --wait [--wait-timeout 5m]
|
||||
|
||||
Wrong: never use sleep + status in a loop:
|
||||
|
||||
# BAD — do not do this
|
||||
sleep 30 && blacksmith testbox status --id <ID>
|
||||
while ! blacksmith testbox status --id <ID> | grep ready; do sleep 5; done
|
||||
|
||||
`--wait` polls the status and exits as soon as the testbox is ready (or when the
|
||||
timeout is reached). Default timeout is 5m; use `--wait-timeout` for longer
|
||||
(e.g. `10m`, `1h`).
|
||||
|
||||
## Managing testboxes
|
||||
|
||||
# Check status of a specific testbox
|
||||
blacksmith testbox status --id <ID>
|
||||
|
||||
# List all active testboxes for the current repo
|
||||
blacksmith testbox list
|
||||
|
||||
# Stop a testbox when you're done (frees resources)
|
||||
blacksmith testbox stop --id <ID>
|
||||
|
||||
Testboxes automatically shut down after being idle (default: 30 minutes).
|
||||
If you need a longer session, increase the timeout at warmup time. For OpenClaw
|
||||
maintainer work, use 90 minutes for long-running sessions:
|
||||
|
||||
blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90
|
||||
blacksmith testbox warmup ci-build-artifacts-testbox.yml --idle-timeout 90
|
||||
|
||||
## With options
|
||||
|
||||
blacksmith testbox warmup ci-check-testbox.yml --ref main
|
||||
blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90
|
||||
blacksmith testbox run --id <ID> "go test ./..."
|
||||
@@ -1,54 +1,249 @@
|
||||
---
|
||||
name: crabbox
|
||||
description: Use Crabbox for OpenClaw remote Linux validation, warmed reusable boxes, GitHub Actions hydration, sync timing, logs, results, caches, and lease cleanup.
|
||||
description: Use Crabbox for OpenClaw remote Linux validation. Default to Blacksmith Testbox; includes direct Blacksmith and owned AWS/Hetzner fallback notes when Crabbox fails.
|
||||
---
|
||||
|
||||
# Crabbox
|
||||
|
||||
Use Crabbox when OpenClaw needs remote Linux proof on owned capacity, a large
|
||||
runner class, reusable warm state, or a Blacksmith alternative.
|
||||
Use Crabbox when OpenClaw needs remote Linux proof for broad tests, CI-parity
|
||||
checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable
|
||||
boxes, sync timing, logs/results, cache inspection, or lease cleanup.
|
||||
|
||||
## Before Running
|
||||
Default backend: `blacksmith-testbox`. The separate `blacksmith-testbox` skill
|
||||
has been removed; this skill owns both the normal Crabbox path and the direct
|
||||
Blacksmith fallback playbook.
|
||||
|
||||
## First Checks
|
||||
|
||||
- Run from the repo root. Crabbox sync mirrors the current checkout.
|
||||
- Prefer local targeted tests for tight edit loops.
|
||||
- Prefer Blacksmith Testbox when the task explicitly asks for Blacksmith or a
|
||||
Blacksmith-specific CI comparison.
|
||||
- Use Crabbox for broad OpenClaw gates when owned AWS/Hetzner capacity is the
|
||||
right remote lane.
|
||||
- Check `.crabbox.yaml` for repo defaults before adding flags.
|
||||
- Sanity-check the selected binary before remote work. OpenClaw scripts prefer
|
||||
`../crabbox/bin/crabbox` when present; the user PATH shim can be stale:
|
||||
`command -v crabbox; ../crabbox/bin/crabbox --version; ../crabbox/bin/crabbox --help | sed -n '1,90p'`.
|
||||
- Install with `brew install openclaw/tap/crabbox`; auth is required before use:
|
||||
`printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`.
|
||||
- On macOS the user config is `~/Library/Application Support/crabbox/config.yaml`;
|
||||
it must include `broker.url`, `broker.token`, and usually `provider: aws`.
|
||||
|
||||
## OpenClaw Flow
|
||||
|
||||
AWS/owned-capacity flow for `pnpm` tests:
|
||||
- Check the wrapper and providers before remote work:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:warmup -- --idle-timeout 90m
|
||||
pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m
|
||||
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
|
||||
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
|
||||
command -v crabbox
|
||||
../crabbox/bin/crabbox --version
|
||||
pnpm crabbox:run -- --help | sed -n '1,120p'
|
||||
```
|
||||
|
||||
Blacksmith-backed Crabbox flow can delegate setup to the Testbox workflow:
|
||||
- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
|
||||
shim can be stale.
|
||||
- Check `.crabbox.yaml` for repo defaults, but override provider explicitly.
|
||||
Even if config still says AWS, maintainer validation should normally pass
|
||||
`--provider blacksmith-testbox`.
|
||||
- Prefer local targeted tests for tight edit loops. Broad gates belong remote.
|
||||
|
||||
## Default Blacksmith Backend
|
||||
|
||||
Use this for `pnpm check`, `pnpm check:changed`, `pnpm test`,
|
||||
`pnpm test:changed`, Docker/E2E/live/package gates, or anything likely to fan
|
||||
out across many Vitest projects.
|
||||
|
||||
Changed gate:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:run -- --provider blacksmith-testbox --blacksmith-org openclaw --blacksmith-workflow .github/workflows/ci-check-testbox.yml --blacksmith-job check --blacksmith-ref main --idle-timeout 90m --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
|
||||
pnpm crabbox:run -- --provider blacksmith-testbox \
|
||||
--blacksmith-org openclaw \
|
||||
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
|
||||
--blacksmith-job check \
|
||||
--blacksmith-ref main \
|
||||
--idle-timeout 90m \
|
||||
--ttl 240m \
|
||||
--timing-json \
|
||||
--shell -- \
|
||||
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
|
||||
```
|
||||
|
||||
Full suite:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:run -- --provider blacksmith-testbox \
|
||||
--blacksmith-org openclaw \
|
||||
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
|
||||
--blacksmith-job check \
|
||||
--blacksmith-ref main \
|
||||
--idle-timeout 90m \
|
||||
--ttl 240m \
|
||||
--timing-json \
|
||||
--shell -- \
|
||||
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
|
||||
```
|
||||
|
||||
Focused rerun:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:run -- --provider blacksmith-testbox \
|
||||
--blacksmith-org openclaw \
|
||||
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
|
||||
--blacksmith-job check \
|
||||
--blacksmith-ref main \
|
||||
--idle-timeout 90m \
|
||||
--ttl 240m \
|
||||
--timing-json \
|
||||
--shell -- \
|
||||
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
|
||||
```
|
||||
|
||||
Read the JSON summary. Useful fields:
|
||||
|
||||
- `provider`: should be `blacksmith-testbox`
|
||||
- `leaseId`: `tbx_...`
|
||||
- `syncDelegated`: should be `true`
|
||||
- `commandMs` / `totalMs`
|
||||
- `exitCode`
|
||||
|
||||
Crabbox should stop one-shot Blacksmith Testboxes automatically after the run.
|
||||
Verify cleanup when a run fails, is interrupted, or the command output is
|
||||
unclear:
|
||||
|
||||
```sh
|
||||
blacksmith testbox list
|
||||
```
|
||||
|
||||
## Reuse And Keepalive
|
||||
|
||||
For most Blacksmith-backed Crabbox calls, one-shot is enough. Use reuse only
|
||||
when you need multiple manual commands on the same hydrated box.
|
||||
|
||||
If Crabbox returns a reusable id or you intentionally keep a lease:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:run -- --provider blacksmith-testbox --id <tbx_id> --no-sync --timing-json --shell -- "pnpm test <path>"
|
||||
```
|
||||
|
||||
Stop boxes you created before handoff:
|
||||
|
||||
```sh
|
||||
pnpm crabbox:stop -- <id-or-slug>
|
||||
blacksmith testbox stop --id <tbx_id>
|
||||
```
|
||||
|
||||
## If Crabbox Fails
|
||||
|
||||
Keep the fallback narrow. First decide whether the failure is Crabbox itself,
|
||||
Blacksmith/Testbox, repo hydration, sync, or the test command.
|
||||
|
||||
Fast checks:
|
||||
|
||||
```sh
|
||||
command -v crabbox
|
||||
../crabbox/bin/crabbox --version
|
||||
crabbox run --provider blacksmith-testbox --help | sed -n '1,140p'
|
||||
command -v blacksmith
|
||||
blacksmith --version
|
||||
blacksmith testbox list
|
||||
```
|
||||
|
||||
Common Crabbox-only failures:
|
||||
|
||||
- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling
|
||||
repo, or update/install Crabbox before retrying.
|
||||
- Bad local config: pass `--provider blacksmith-testbox` plus explicit
|
||||
`--blacksmith-*` flags instead of relying on `.crabbox.yaml`.
|
||||
- Slug/claim confusion: use the raw `tbx_...` id, or run one-shot without
|
||||
`--id`.
|
||||
- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the
|
||||
printed Actions URL.
|
||||
- Cleanup uncertainty: run `blacksmith testbox list` and stop only boxes you
|
||||
created.
|
||||
|
||||
If Crabbox cannot dispatch, sync, attach, or stop but Blacksmith itself works,
|
||||
use direct Blacksmith from the repo root:
|
||||
|
||||
```sh
|
||||
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
|
||||
blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
|
||||
blacksmith testbox stop --id <tbx_id>
|
||||
```
|
||||
|
||||
Direct full suite:
|
||||
|
||||
```sh
|
||||
blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
|
||||
```
|
||||
|
||||
Auth fallback, only when `blacksmith` says auth is missing:
|
||||
|
||||
```sh
|
||||
blacksmith auth login --non-interactive --organization openclaw
|
||||
```
|
||||
|
||||
Raw Blacksmith footguns:
|
||||
|
||||
- Run from repo root. The CLI syncs the current directory.
|
||||
- Save the returned `tbx_...` id in the session.
|
||||
- Reuse that id for focused reruns; stop it before handoff.
|
||||
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
|
||||
- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable
|
||||
queue.
|
||||
|
||||
Escalate to owned AWS/Hetzner only when Blacksmith is down, quota-limited,
|
||||
missing the needed environment, or owned capacity is the explicit goal. Use the
|
||||
Owned Cloud Fallback section below.
|
||||
|
||||
## Blacksmith Backend Notes
|
||||
|
||||
Crabbox Blacksmith backend delegates setup to:
|
||||
|
||||
- org: `openclaw`
|
||||
- workflow: `.github/workflows/ci-check-testbox.yml`
|
||||
- job: `check`
|
||||
- ref: `main` unless testing a branch/tag intentionally
|
||||
|
||||
The hydration workflow owns checkout, Node/pnpm setup, dependency install,
|
||||
secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
|
||||
execution, timing, logs/results, and cleanup.
|
||||
|
||||
Minimal direct Blacksmith fallback, from repo root:
|
||||
|
||||
```sh
|
||||
blacksmith testbox warmup ci-check-testbox.yml --ref main --idle-timeout 90
|
||||
blacksmith testbox run --id <tbx_id> "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed"
|
||||
blacksmith testbox stop --id <tbx_id>
|
||||
```
|
||||
|
||||
Use direct Blacksmith only when Crabbox is the broken layer and Blacksmith
|
||||
itself still works. Prefer direct `blacksmith testbox list` for cleanup
|
||||
diagnostics, not as a reusable work queue.
|
||||
|
||||
Important Blacksmith footguns:
|
||||
|
||||
- Always run from repo root. The CLI syncs the current directory.
|
||||
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
|
||||
- If auth is missing and browser auth is acceptable:
|
||||
|
||||
```sh
|
||||
blacksmith auth login --non-interactive --organization openclaw
|
||||
```
|
||||
|
||||
## Owned Cloud Fallback
|
||||
|
||||
Use AWS/Hetzner only when Blacksmith is down, quota-limited, missing the needed
|
||||
environment, or owned capacity is explicitly the goal.
|
||||
|
||||
```sh
|
||||
pnpm crabbox:warmup -- --provider aws --class beast --market on-demand --idle-timeout 90m
|
||||
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
|
||||
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
|
||||
pnpm crabbox:stop -- <cbx_id-or-slug>
|
||||
```
|
||||
|
||||
## Useful Commands
|
||||
Install/auth for owned Crabbox if needed:
|
||||
|
||||
```sh
|
||||
brew install openclaw/tap/crabbox
|
||||
printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin
|
||||
```
|
||||
|
||||
macOS config lives at:
|
||||
|
||||
```text
|
||||
~/Library/Application Support/crabbox/config.yaml
|
||||
```
|
||||
|
||||
It should include `broker.url`, `broker.token`, and usually `provider: aws`
|
||||
for owned-cloud lanes. Do not let that config override the OpenClaw default
|
||||
when Blacksmith proof is requested; pass `--provider blacksmith-testbox`.
|
||||
|
||||
## Diagnostics
|
||||
|
||||
```sh
|
||||
crabbox status --id <id-or-slug> --wait
|
||||
@@ -59,29 +254,30 @@ crabbox logs <run_id>
|
||||
crabbox results <run_id>
|
||||
crabbox cache stats --id <id-or-slug>
|
||||
crabbox ssh --id <id-or-slug>
|
||||
blacksmith testbox list
|
||||
```
|
||||
|
||||
Use `--debug` on `run` when measuring sync timing.
|
||||
Use `--timing-json` on warmup, hydrate, and run when comparing AWS and
|
||||
blacksmith-testbox timings.
|
||||
Use `--market spot|on-demand` on AWS warmup or one-shot run when testing quota
|
||||
or capacity behavior without changing `.crabbox.yaml`.
|
||||
Use `--timing-json` on warmup, hydrate, and run when comparing backends.
|
||||
Use `--market spot|on-demand` only on AWS warmup/one-shot runs.
|
||||
|
||||
## Hydration Boundary
|
||||
## Failure Triage
|
||||
|
||||
`.github/workflows/crabbox-hydrate.yml` is repo-specific on purpose. It owns
|
||||
OpenClaw checkout, setup-node, pnpm setup, provider env hydration, ready marker,
|
||||
and keepalive. Crabbox owns runner registration, workflow dispatch, SSH sync,
|
||||
command execution, logs/results, local lease claims, and idle cleanup.
|
||||
- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists
|
||||
`blacksmith-testbox`; update Crabbox before falling back.
|
||||
- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect
|
||||
the hydration step.
|
||||
- Sync failed: rerun with `--debug`; check changed-file count and whether the
|
||||
checkout is dirty.
|
||||
- Command failed: rerun only the failing shard/file first. Do not rerun a full
|
||||
suite until the focused failure is understood.
|
||||
- Cleanup uncertain: `blacksmith testbox list`; stop owned `tbx_...` leases you
|
||||
created.
|
||||
- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above,
|
||||
then file/fix the Crabbox issue.
|
||||
|
||||
Do not add OpenClaw-specific setup to Crabbox. Put repo setup in the hydration
|
||||
workflow and generic lease/sync behavior in Crabbox.
|
||||
## Boundary
|
||||
|
||||
## Cleanup
|
||||
|
||||
Crabbox has coordinator-owned idle expiry and local lease claims, so OpenClaw
|
||||
does not need a custom ledger. Default idle timeout is 30 minutes unless config
|
||||
or flags set a different value. Still stop boxes you created when done.
|
||||
If `crabbox list` prints `orphan=no-active-lease`, treat it as an operator
|
||||
review hint; do not delete `keep=true` machines without checking provider and
|
||||
coordinator state.
|
||||
Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the
|
||||
hydration workflow and keep Crabbox generic around lease, sync, command
|
||||
execution, logs/results, timing, and cleanup.
|
||||
|
||||
Reference in New Issue
Block a user