chore(agents): prefer local validation over testbox

This commit is contained in:
Vincent Koc
2026-04-21 22:36:25 -07:00
parent 1a7dfbbaba
commit 7c13a48e49
2 changed files with 77 additions and 26 deletions

View File

@@ -1,14 +1,26 @@
---
name: blacksmith-testbox
description: >
Validate code changes against real CI. Use for all tests, builds,
migrations, and any command that depends on secrets or services.
Use when testing, validating, checking, verifying, or before any
commit or push.
Validate code changes against real CI when local execution is not
enough. Use for CI-parity checks, secrets/services, migrations, or
builds/tests that cannot run reliably on the local machine. Do not
replace repo-documented local test/build loops just because this
skill exists.
---
# Blacksmith Testbox
## Scope
Use Testbox when you need remote CI parity, injected secrets, hosted services,
or an OS/runtime image that your local machine cannot provide cheaply.
Do not default to Testbox for every local test/build loop. If the repo has
documented local commands for normal iteration, use those first so you keep
warm caches, local build state, and fast feedback.
Testbox is the expensive path. Reach for it deliberately.
## Install the CLI
If `blacksmith` is not installed, install it:
@@ -55,10 +67,41 @@ Optional flags:
- `--dashboard-url <url>` — Override dashboard URL (e.g. for staging)
## Decide first: local or Testbox
Before warming anything up, check the repo's own instructions.
Prefer local commands when:
- the repo documents a supported local test/build workflow
- you are iterating on unit tests, lint, typecheck, formatting, or other
local-only validation
- the value comes from warm local caches and fast repeat runs
- the command does not need remote secrets, hosted services, or CI-only images
Prefer Testbox when:
- the repo explicitly requires CI-parity or remote validation
- the command needs secrets, service containers, or provisioned infra
- you are reproducing CI-only failures
- you need the exact workflow image/job environment from GitHub Actions
For OpenClaw specifically, normal local iteration should stay local:
- `pnpm check:changed`
- `pnpm test:changed`
- `pnpm test <path-or-filter>`
- `pnpm test:serial`
- `pnpm build`
Only use Testbox in OpenClaw when the user explicitly wants CI-parity or the
check truly depends on remote secrets/services that the local repo loop cannot
provide.
## Setup: Warmup before coding
Before writing any code, warm up a testbox. This returns an ID instantly
and boots the CI environment in the background while you work:
If you decided Testbox is actually warranted, warm one up early. This returns
an ID instantly and boots the CI environment in the background while you work:
blacksmith testbox warmup ci-check-testbox.yml
# → tbx_01jkz5b3t9...
@@ -177,39 +220,45 @@ to the remote testbox before executing your command:
testbox. The sync model keeps the remote in lockstep with your local managed
file set.
## CRITICAL: Never run tests locally
## CRITICAL: Do not ban local tests
Your local machine does not have the secrets, services, or environment
parity needed to validate changes reliably. Every test run, migration,
build, or integration check MUST go through Testbox.
Do not assume local validation is forbidden. Many repos intentionally invest in
fast, warm local loops, and forcing every run through Testbox destroys that
advantage.
Use Testbox for the checks that actually need it: remote parity, secrets,
services, CI-only runners, or reproducibility against the workflow image.
If the repo says local tests/builds are the normal path, follow the repo.
## When to use
ALWAYS use Testbox when:
Use Testbox when:
- Running any test (unit, integration, e2e)
- Running database migrations
- Building the project to check for compile errors
- Running any command that depends on secrets or environment variables
- Validating changes before committing
- running database migrations or destructive environment checks
- running commands that depend on secrets or environment variables not present locally
- reproducing CI-only failures or validating against the workflow image
- validating behavior that needs provisioned services or remote runners
- doing a final parity check before commit/push when the repo or user wants that
The ONLY exception is trivial checks with zero external dependencies
(e.g., running a linter or formatter locally).
Trim that list based on repo guidance. If the repo documents supported local
tests/builds, prefer local for routine iteration and keep Testbox for the
checks that need parity or remote state.
## Workflow
1. Warm up immediately when you receive a coding task:
1. Decide whether the repo's local loop is the right default.
2. Only if Testbox is warranted, warm up early:
`blacksmith testbox warmup ci-check-testbox.yml` → save the ID
2. Write code while the testbox boots in the background.
3. Run tests (the CLI auto-waits if the testbox isn't ready yet):
3. Write code while the testbox boots in the background.
4. Run the remote command when needed:
`blacksmith testbox run --id <ID> "npm test"`
4. If tests fail, fix code and re-run (fast — same warm testbox, only
changed files are synced).
5. If you changed dependency manifests (package.json, etc.), prepend
5. If tests fail, fix code and re-run against the same warm box.
6. If you changed dependency manifests (package.json, etc.), prepend
the install command: `blacksmith testbox run --id <ID> "npm install && npm test"`
6. If you need artifacts (coverage reports, build outputs, etc.), download them:
7. If you need artifacts (coverage reports, build outputs, etc.), download them:
`blacksmith testbox download --id <ID> coverage/ ./coverage/`
7. Once green, commit and push.
8. Once green, commit and push.
## OpenClaw full test suite

View File

@@ -65,6 +65,7 @@ Scoped guides:
- Normal full prod sweep: `pnpm check` (prod typecheck/lint/guards, no tests)
- Full tests: `pnpm test`
- Changed tests only: `pnpm test:changed`
- Local serial loop: `pnpm test:serial`
- Extension tests: `pnpm test:extensions` or `pnpm test extensions` = all extension shards; `pnpm test extensions/<id>` = one extension lane. Heavy channels/OpenAI have dedicated shards.
- Shard timing artifact: `.artifacts/vitest-shard-timings.json`; auto-used for balanced shard ordering. Disable with `OPENCLAW_TEST_PROJECTS_TIMINGS=0`.
- Targeted tests: `pnpm test <path-or-filter> [vitest args...]`; do not call raw `vitest`.
@@ -84,6 +85,7 @@ Scoped guides:
- `pnpm lint:apps`: Swift/app surface, separate from TS lint
- `pnpm lint:all`: legacy comparison lane
- Local heavy-check behavior: `OPENCLAW_LOCAL_CHECK=1` default; `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`; `OPENCLAW_LOCAL_CHECK=0` for CI/shared runs.
- Local validation is local-first. Do not default to Blacksmith/Testbox for routine OpenClaw iteration; it burns warm caches and startup time. Use repo `pnpm` lanes first, then reach for remote CI/Testbox only for parity-only failures, secrets/services, or when explicitly requested.
## Gates