chore(agents): prefer local validation over testbox

2026-05-06 05:30:42 +00:00 · 2026-04-21 22:36:25 -07:00
parent 1a7dfbbaba
commit 7c13a48e49
2 changed files with 77 additions and 26 deletions
--- a/.agents/skills/blacksmith-testbox/SKILL.md
+++ b/.agents/skills/blacksmith-testbox/SKILL.md
@@ -1,14 +1,26 @@
 ---
 name: blacksmith-testbox
 description: >
-  Validate code changes against real CI. Use for all tests, builds,
-  migrations, and any command that depends on secrets or services.
-  Use when testing, validating, checking, verifying, or before any
-  commit or push.
+  Validate code changes against real CI when local execution is not
+  enough. Use for CI-parity checks, secrets/services, migrations, or
+  builds/tests that cannot run reliably on the local machine. Do not
+  replace repo-documented local test/build loops just because this
+  skill exists.
 ---

 # Blacksmith Testbox

+## Scope
+
+Use Testbox when you need remote CI parity, injected secrets, hosted services,
+or an OS/runtime image that your local machine cannot provide cheaply.
+
+Do not default to Testbox for every local test/build loop. If the repo has
+documented local commands for normal iteration, use those first so you keep
+warm caches, local build state, and fast feedback.
+
+Testbox is the expensive path. Reach for it deliberately.
+
 ## Install the CLI

 If `blacksmith` is not installed, install it:
@@ -55,10 +67,41 @@ Optional flags:

 - `--dashboard-url <url>` — Override dashboard URL (e.g. for staging)

+## Decide first: local or Testbox
+
+Before warming anything up, check the repo's own instructions.
+
+Prefer local commands when:
+
+- the repo documents a supported local test/build workflow
+- you are iterating on unit tests, lint, typecheck, formatting, or other
+  local-only validation
+- the value comes from warm local caches and fast repeat runs
+- the command does not need remote secrets, hosted services, or CI-only images
+
+Prefer Testbox when:
+
+- the repo explicitly requires CI-parity or remote validation
+- the command needs secrets, service containers, or provisioned infra
+- you are reproducing CI-only failures
+- you need the exact workflow image/job environment from GitHub Actions
+
+For OpenClaw specifically, normal local iteration should stay local:
+
+- `pnpm check:changed`
+- `pnpm test:changed`
+- `pnpm test <path-or-filter>`
+- `pnpm test:serial`
+- `pnpm build`
+
+Only use Testbox in OpenClaw when the user explicitly wants CI-parity or the
+check truly depends on remote secrets/services that the local repo loop cannot
+provide.
+
 ## Setup: Warmup before coding

-Before writing any code, warm up a testbox. This returns an ID instantly
-and boots the CI environment in the background while you work:
+If you decided Testbox is actually warranted, warm one up early. This returns
+an ID instantly and boots the CI environment in the background while you work:

    blacksmith testbox warmup ci-check-testbox.yml
    # → tbx_01jkz5b3t9...
@@ -177,39 +220,45 @@ to the remote testbox before executing your command:
  testbox. The sync model keeps the remote in lockstep with your local managed
  file set.

-## CRITICAL: Never run tests locally
+## CRITICAL: Do not ban local tests

-Your local machine does not have the secrets, services, or environment
-parity needed to validate changes reliably. Every test run, migration,
-build, or integration check MUST go through Testbox.
+Do not assume local validation is forbidden. Many repos intentionally invest in
+fast, warm local loops, and forcing every run through Testbox destroys that
+advantage.
+
+Use Testbox for the checks that actually need it: remote parity, secrets,
+services, CI-only runners, or reproducibility against the workflow image.
+
+If the repo says local tests/builds are the normal path, follow the repo.

 ## When to use

-ALWAYS use Testbox when:
+Use Testbox when:

- Running any test (unit, integration, e2e)
- Running database migrations
- Building the project to check for compile errors
- Running any command that depends on secrets or environment variables
- Validating changes before committing
+- running database migrations or destructive environment checks
+- running commands that depend on secrets or environment variables not present locally
+- reproducing CI-only failures or validating against the workflow image
+- validating behavior that needs provisioned services or remote runners
+- doing a final parity check before commit/push when the repo or user wants that

-The ONLY exception is trivial checks with zero external dependencies
-(e.g., running a linter or formatter locally).
+Trim that list based on repo guidance. If the repo documents supported local
+tests/builds, prefer local for routine iteration and keep Testbox for the
+checks that need parity or remote state.

 ## Workflow

-1. Warm up immediately when you receive a coding task:
+1. Decide whether the repo's local loop is the right default.
+2. Only if Testbox is warranted, warm up early:
   `blacksmith testbox warmup ci-check-testbox.yml` → save the ID
-2. Write code while the testbox boots in the background.
-3. Run tests (the CLI auto-waits if the testbox isn't ready yet):
+3. Write code while the testbox boots in the background.
+4. Run the remote command when needed:
   `blacksmith testbox run --id <ID> "npm test"`
-4. If tests fail, fix code and re-run (fast — same warm testbox, only
-   changed files are synced).
-5. If you changed dependency manifests (package.json, etc.), prepend
+5. If tests fail, fix code and re-run against the same warm box.
+6. If you changed dependency manifests (package.json, etc.), prepend
   the install command: `blacksmith testbox run --id <ID> "npm install && npm test"`
-6. If you need artifacts (coverage reports, build outputs, etc.), download them:
+7. If you need artifacts (coverage reports, build outputs, etc.), download them:
   `blacksmith testbox download --id <ID> coverage/ ./coverage/`
-7. Once green, commit and push.
+8. Once green, commit and push.

 ## OpenClaw full test suite

--- a/AGENTS.md
+++ b/AGENTS.md
@@ -65,6 +65,7 @@ Scoped guides:
 - Normal full prod sweep: `pnpm check` (prod typecheck/lint/guards, no tests)
 - Full tests: `pnpm test`
 - Changed tests only: `pnpm test:changed`
+- Local serial loop: `pnpm test:serial`
 - Extension tests: `pnpm test:extensions` or `pnpm test extensions` = all extension shards; `pnpm test extensions/<id>` = one extension lane. Heavy channels/OpenAI have dedicated shards.
 - Shard timing artifact: `.artifacts/vitest-shard-timings.json`; auto-used for balanced shard ordering. Disable with `OPENCLAW_TEST_PROJECTS_TIMINGS=0`.
 - Targeted tests: `pnpm test <path-or-filter> [vitest args...]`; do not call raw `vitest`.
@@ -84,6 +85,7 @@ Scoped guides:
  - `pnpm lint:apps`: Swift/app surface, separate from TS lint
  - `pnpm lint:all`: legacy comparison lane
 - Local heavy-check behavior: `OPENCLAW_LOCAL_CHECK=1` default; `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`; `OPENCLAW_LOCAL_CHECK=0` for CI/shared runs.
+- Local validation is local-first. Do not default to Blacksmith/Testbox for routine OpenClaw iteration; it burns warm caches and startup time. Use repo `pnpm` lanes first, then reach for remote CI/Testbox only for parity-only failures, secrets/services, or when explicitly requested.

 ## Gates