diff --git a/.agents/skills/blacksmith-testbox/SKILL.md b/.agents/skills/blacksmith-testbox/SKILL.md index 60546311d03..ef53f45c78b 100644 --- a/.agents/skills/blacksmith-testbox/SKILL.md +++ b/.agents/skills/blacksmith-testbox/SKILL.md @@ -10,9 +10,8 @@ description: Run Blacksmith Testbox for CI-parity checks, secrets, hosted servic Use Testbox when you need remote CI parity, injected secrets, hosted services, or an OS/runtime image that your local machine cannot provide cheaply. -Do not default to Testbox for every local test/build loop unless the repo or -the user's personal maintainer rules explicitly say Testbox-first. If the repo -has documented local commands for normal iteration, use those first so you keep +Do not default to Testbox for every local test/build loop. If the repo has +documented local commands for normal iteration, use those first so you keep warm caches, local build state, and fast feedback. Testbox is the expensive path. Reach for it deliberately. @@ -82,8 +81,7 @@ Prefer Testbox when: - you are reproducing CI-only failures - you need the exact workflow image/job environment from GitHub Actions -For OpenClaw specifically, contributor and routine local iteration should stay -local: +For OpenClaw specifically, normal local iteration should stay local: - `pnpm check:changed` - `pnpm test:changed` @@ -91,11 +89,9 @@ local: - `pnpm test:serial` - `pnpm build` -OpenClaw maintainer mode is different. If the user has Blacksmith access and -sets `OPENCLAW_TESTBOX=1`, or their personal agent rules say Testbox-first, -route broad, slow, Docker, live, E2E, full-suite, and CI-parity validation -through Testbox by default. `OPENCLAW_LOCAL_CHECK_MODE=throttled` remains the -escape hatch for laptop-friendly local proof. +Only use Testbox in OpenClaw when the user explicitly wants CI-parity or the +check truly depends on remote secrets/services that the local repo loop cannot +provide. For installable-package product proof, prefer the GitHub `Package Acceptance` workflow over an ad hoc Testbox command. It resolves one package candidate @@ -115,35 +111,13 @@ an ID instantly and boots the CI environment in the background while you work: Save this ID. You need it for every `run` command. -For long-ish OpenClaw maintainer tasks in Testbox mode, pre-warm at the start -with a longer idle timeout: - - blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90 - # → tbx_01jkz5b3t9... - -The CLI and current docs expose `--idle-timeout ` and document the -default as 30 minutes, but do not publish a universal maximum. OpenClaw policy: -use `90` for normal long-ish tasks, `240` for multi-hour work, `720` for -all-day work, and `1440` for overnight work. Anything above `1440` minutes -requires explicit user intent and an end-of-task cleanup check. - -Observed on 2026-04-27: Blacksmith accepted `90`, `240`, `720`, `1440`, -`4320`, `10080`, `43200`, and even `525600` minutes, with every probe box -stopped immediately. Treat that as "no sane visible cap", not permission to -leave giant-idle boxes around. - -Choose the warmup ref deliberately. `--ref ` can point at a -branch, tag, or SHA. For cache seeding, prefer exact current branch/SHA for -correctness; use the latest `beta` or `latest` release SHA only as a warm cache -seed, then still run the build/check that proves local synced changes. - Warmup dispatches a GitHub Actions workflow that provisions a VM with the full CI environment: dependencies installed, services started, secrets injected, and a clean checkout of the repo at the default branch. Options: - --ref Git ref to dispatch against (default: repo's default branch) + --ref Git ref to dispatch against (default: repo's default branch) --job Specific job within the workflow (if it has multiple) --idle-timeout Idle timeout in minutes (default: 30) @@ -276,27 +250,18 @@ checks that need parity or remote state. ## Workflow -1. Decide whether the repo's local loop or maintainer Testbox mode is the right - default. +1. Decide whether the repo's local loop is the right default. 2. Only if Testbox is warranted, warm up early: - `blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90` → save the ID. - Use `--idle-timeout 240`, `720`, or `1440` only when the task duration - justifies it. + `blacksmith testbox warmup ci-check-testbox.yml` → save the ID 3. Write code while the testbox boots in the background. 4. Run the remote command when needed: `blacksmith testbox run --id "npm test"` -5. If tests fail, fix code and re-run against the same warm box. Reuse this - same `tbx_...` for every run/download in the task unless it expires, the - workflow/ref/env must change, or the user asks for a fresh box. +5. If tests fail, fix code and re-run against the same warm box. 6. If you changed dependency manifests (package.json, etc.), prepend the install command: `blacksmith testbox run --id "npm install && npm test"` 7. If you need artifacts (coverage reports, build outputs, etc.), download them: `blacksmith testbox download --id coverage/ ./coverage/` 8. Once green, commit and push. -9. If you used a long timeout or created probe boxes, clean up with - `blacksmith testbox list` and `blacksmith testbox stop --id `. Stop only - boxes from the current task unless the user asks you to clean up other active - boxes. ## OpenClaw full test suite @@ -369,24 +334,10 @@ timeout is reached). Default timeout is 5m; use `--wait-timeout` for longer Testboxes automatically shut down after being idle (default: 30 minutes). If you need a longer session, increase the timeout at warmup time: - blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90 - -For OpenClaw maintainer work, use coarse timeout bins instead of probing many -small values: - -- `90` minutes: default long-ish task -- `240` minutes: multi-hour task -- `720` minutes: all-day task -- `1440` minutes: overnight task; max without explicit user intent - -Because the service currently accepts much larger values, cleanup is part of -the workflow, not a nice-to-have: - - blacksmith testbox list - blacksmith testbox stop --id + blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 60 ## With options blacksmith testbox warmup ci-check-testbox.yml --ref main - blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 240 + blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 60 blacksmith testbox run --id "go test ./..." diff --git a/AGENTS.md b/AGENTS.md index c5d54ac309b..faca52035ae 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -54,10 +54,7 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work. - Formatting: use `oxfmt`, not Prettier. Prefer `pnpm format:check` / `pnpm format`; for targeted files use `pnpm exec oxfmt --check --threads=1 ` or `pnpm exec oxfmt --write --threads=1 `. - Linting: use repo wrappers (`pnpm lint:*`, `scripts/run-oxlint.mjs`); do not invoke generic JS formatters/lints unless a repo script uses them. - Heavy checks: `OPENCLAW_LOCAL_CHECK=1`, mode `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`; CI/shared use `OPENCLAW_LOCAL_CHECK=0`. -- Default contributor path: local repo `pnpm` lanes first. Maintainer-only Testbox path: when Blacksmith access is configured and `OPENCLAW_TESTBOX=1` or personal rules request Testbox-first, use Blacksmith for broad, slow, Docker, live, E2E, full-suite, or CI-parity validation. `OPENCLAW_LOCAL_CHECK_MODE=throttled` is the local escape hatch. -- Testbox pre-warm: for long-ish OpenClaw tasks in Testbox mode, run from repo root early: `blacksmith testbox warmup ci-check-testbox.yml --idle-timeout 90`. Use `240`, `720`, or `1440` only for multi-hour, all-day, or overnight work; above `1440` requires explicit user intent. Save the returned `tbx_...` and reuse it for every `blacksmith testbox run --id ...` in that task unless the box expires, the workflow/ref/env must change, or the user asks for a fresh box. -- Testbox cleanup: track every created `tbx_...`; use `blacksmith testbox list` to inspect active boxes and `blacksmith testbox stop --id ` to stop boxes from the current task. Do not stop pre-existing boxes unless they are clearly yours or the user asks. -- Testbox cache seed: `--ref ` may point at the current branch/SHA for correctness or a latest `beta`/`latest` SHA for warm cache state. A seeded box is not proof by itself; still run the build/check after local sync. +- Local first. Use repo `pnpm` lanes before Blacksmith/Testbox. Remote only for parity-only failures, secrets/services, or explicit ask. ## GitHub / CI