From 1c0b02a297035bd2f7cd4f41994867a73bbd6667 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Wed, 29 Apr 2026 22:02:46 -0700 Subject: [PATCH] docs(ci): rewrite for structure, deduplication, and findability MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Splits the previous wall-of-prose docs/ci.md into discoverable sections while preserving every operator-relevant detail: - Lead orientation paragraph kept; cross-links to umbrella and prerelease - Pipeline overview anchors the job table at the top - Fail-fast order tightened; superseded-run/concurrency notes folded in - Scope and routing surfaces ci-changed-scope.mjs, the routing-only fast path, the Windows scope rule, Vitest shard balancing, the Android dual-flavor rule, and the check-dependencies (Knip + unused-file allowlist) pass that was buried in the lead - Manual dispatches groups examples + include_android + target_ref - Runners and Local equivalents tables/blocks preserved - Full Release Validation: release_profile and rerun_group bulleted; verifier-only rerun guidance and the shared release-package-under-test artifact called out - Live and E2E shards: native-live shard names listed, live-media-runner image and openclaw-live-test: with OPENCLAW_SKIP_DOCKER_BUILD=1 broken out - Package Acceptance split into Jobs / Candidate sources / Suite profiles / Legacy compatibility windows / Examples / debugging - Install smoke: fast vs full paths, main-push policy, Bun gate - Local Docker E2E: scheduler tunables in a table, reusable workflow flow, release-path chunks list, rerun helpers - Plugin Prerelease, QA Lab, CodeQL each get their own discoverable sections; CodeQL uses tables for security and quality categories instead of paragraph walls (kept the new provider-runtime-boundary shard in the PR-quality-guard list) - Maintenance workflows groups Docs Agent, Test Performance Agent, and Duplicate PRs After Merge - Local check gates and changed routing turn boundary lane rules into bullets and keep the explicit-mapping prose - Testbox validation kept; Related links preserved Audited every workflow name and CodeQL category against .github/workflows/ — no stale references. File goes from 527 to 413 lines while preserving shard names, env vars, profiles, chunks, and legacy-compat windows. Layout obeys oxfmt. --- docs/ci.md | 758 +++++++++++++++++++++++------------------------------ 1 file changed, 321 insertions(+), 437 deletions(-) diff --git a/docs/ci.md b/docs/ci.md index 3de254be543..5b8dd765b55 100644 --- a/docs/ci.md +++ b/docs/ci.md @@ -1,181 +1,196 @@ --- -summary: "CI job graph, scope gates, and local command equivalents" -title: CI pipeline +summary: "CI job graph, scope gates, release umbrellas, and local command equivalents" +title: "CI pipeline" read_when: - You need to understand why a CI job did or did not run - - You are debugging failing GitHub Actions checks + - You are debugging a failing GitHub Actions check + - You are coordinating a release validation run or rerun --- -The CI runs on every push to `main` and every pull request. It uses smart scoping to skip expensive jobs when only unrelated areas changed. Manual `workflow_dispatch` runs intentionally bypass smart scoping and fan out the full normal CI graph for release candidates or broad validation, with Android lanes opt-in through `include_android` for standalone manual runs. Release-only plugin prerelease lanes live in the separate `Plugin Prerelease` workflow and run only from `Full Release Validation` or an explicit manual dispatch. +OpenClaw CI runs on every push to `main` and every pull request. The `preflight` job classifies the diff and turns expensive lanes off when only unrelated areas changed. Manual `workflow_dispatch` runs intentionally bypass smart scoping and fan out the full graph for release candidates and broad validation. Android lanes stay opt-in through `include_android`. Release-only plugin coverage lives in the separate [`Plugin Prerelease`](#plugin-prerelease) workflow and only runs from [`Full Release Validation`](#full-release-validation) or an explicit manual dispatch. -The `check-dependencies` shard runs `pnpm deadcode:dependencies`, a production Knip dependency-only pass pinned to the latest Knip version used by that script, with pnpm's minimum release age disabled for the `dlx` install. It also runs `pnpm deadcode:unused-files`, which compares Knip's production unused-file findings against `scripts/deadcode-unused-files.allowlist.mjs`. That guard fails when a PR adds a new unreviewed unused file or leaves a stale allowlist entry after cleanup, while preserving intentional dynamic plugin, generated, build, live-test, and package bridge surfaces that Knip cannot resolve statically. +## Pipeline overview -`Full Release Validation` is the manual umbrella workflow for "run everything -before release." It accepts a branch, tag, or full commit SHA, dispatches the -manual `CI` workflow with that target, dispatches `Plugin Prerelease` for -release-only plugin/package/static/Docker proof, and dispatches -`OpenClaw Release Checks` for install smoke, package acceptance, Docker -release-path suites, live/E2E, OpenWebUI, QA Lab parity, Matrix, and Telegram -lanes. It can also run the post-publish `NPM Telegram Beta E2E` workflow when a -published package spec is provided. `release_profile=minimum|stable|full` controls the live/provider -breadth passed into release checks: `minimum` keeps the fastest OpenAI/core -release-critical lanes, `stable` adds the stable provider/backend set, and -`full` runs the broad advisory provider/media matrix. The umbrella records the -dispatched child run ids, and the final `Verify full validation` job re-checks -the current child run conclusions and appends slowest-job tables for each child -run. If a child workflow is rerun and turns green, rerun only the parent -verifier job to refresh the umbrella result and timing summary. +| Job | Purpose | When it runs | +| -------------------------------- | -------------------------------------------------------------------------------------------- | ---------------------------------- | +| `preflight` | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest | Always on non-draft pushes and PRs | +| `security-scm-fast` | Private key detection and workflow audit via `zizmor` | Always on non-draft pushes and PRs | +| `security-dependency-audit` | Dependency-free production lockfile audit against npm advisories | Always on non-draft pushes and PRs | +| `security-fast` | Required aggregate for the fast security jobs | Always on non-draft pushes and PRs | +| `check-dependencies` | Production Knip dependency-only pass plus the unused-file allowlist guard | Node-relevant changes | +| `build-artifacts` | Build `dist/`, Control UI, built-artifact checks, and reusable downstream artifacts | Node-relevant changes | +| `checks-fast-core` | Fast Linux correctness lanes such as bundled/plugin-contract/protocol checks | Node-relevant changes | +| `checks-fast-contracts-channels` | Sharded channel contract checks with a stable aggregate check result | Node-relevant changes | +| `checks-node-core-test` | Core Node test shards, excluding channel, bundled, contract, and extension lanes | Node-relevant changes | +| `check` | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke | Node-relevant changes | +| `check-additional` | Architecture, boundary, extension-surface guards, package-boundary, and gateway-watch shards | Node-relevant changes | +| `build-smoke` | Built-CLI smoke tests and startup-memory smoke | Node-relevant changes | +| `checks` | Verifier for built-artifact channel tests | Node-relevant changes | +| `checks-node-compat-node22` | Node 22 compatibility build and smoke lane | Manual CI dispatch for releases | +| `check-docs` | Docs formatting, lint, and broken-link checks | Docs changed | +| `skills-python` | Ruff + pytest for Python-backed skills | Python-skill-relevant changes | +| `checks-windows` | Windows-specific process/path tests plus shared runtime import specifier regressions | Windows-relevant changes | +| `macos-node` | macOS TypeScript test lane using the shared built artifacts | macOS-relevant changes | +| `macos-swift` | Swift lint, build, and tests for the macOS app | macOS-relevant changes | +| `android` | Android unit tests for both flavors plus one debug APK build | Android-relevant changes | +| `test-performance-agent` | Daily Codex slow-test optimization after trusted activity | Main CI success or manual dispatch | -For recovery, `Full Release Validation` and `OpenClaw Release Checks` both -accept `rerun_group`. Use `all` for a release candidate, `ci` for only the -normal full CI child, `release-checks` for every release child, or a narrower -release group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, -`qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed -release box rerun bounded after a focused fix. +## Fail-fast order -The release live/E2E child keeps broad native `pnpm test:live` coverage, but it -runs it as named shards (`native-live-src-agents`, -`native-live-src-gateway-core`, provider-filtered -`native-live-src-gateway-profiles` jobs, -`native-live-src-gateway-backends`, `native-live-test`, -`native-live-extensions-a-k`, `native-live-extensions-l-n`, -`native-live-extensions-openai`, `native-live-extensions-o-z-other`, -`native-live-extensions-xai`, split media audio/video shards, and -provider-filtered music shards) through `scripts/test-live-shard.mjs` instead -of one serial job. That keeps the same file coverage while making slow live -provider failures easier to rerun and diagnose. The aggregate -`native-live-extensions-o-z`, `native-live-extensions-media`, and -`native-live-extensions-media-music` shard names remain valid for manual -one-shot reruns. +1. `preflight` decides which lanes exist at all. The `docs-scope` and `changed-scope` logic are steps inside this job, not standalone jobs. +2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check`, `check-additional`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs. +3. `build-artifacts` overlaps with the fast Linux lanes so downstream consumers can start as soon as the shared build is ready. +4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-channels`, `checks-node-core-test`, `checks`, `checks-windows`, `macos-node`, `macos-swift`, and `android`. -The native live media shards run in -`ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04`, built by the -`Live Media Runner Image` workflow. That image preinstalls `ffmpeg` and -`ffprobe`; media jobs only verify the binaries before setup. Keep Docker-backed -live suites on normal Blacksmith runners, because container jobs are the wrong -place to launch nested Docker tests. +GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Aggregate shard checks use `!cancelled() && always()` so they still report normal shard failures but do not queue after the whole workflow has already been superseded. The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs. -Docker-backed live model/backend shards use a separate shared -`ghcr.io/openclaw/openclaw-live-test:` image per selected commit. The live -release workflow builds and pushes that image once, then the Docker live model, -gateway, CLI backend, ACP bind, and Codex harness shards run with -`OPENCLAW_SKIP_DOCKER_BUILD=1`. If those shards rebuild the full source Docker -target independently, the release run is misconfigured and will waste the wall -clock on duplicate image builds. +## Scope and routing -`OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected -ref once into a `release-package-under-test` tarball, then passes that artifact -to both the live/E2E release-path Docker workflow and the package acceptance -shard. That keeps the package bytes consistent across release boxes and avoids -repacking the same candidate in multiple child jobs. +Scope logic lives in `scripts/ci-changed-scope.mjs` and is covered by unit tests in `src/scripts/ci-changed-scope.test.ts`. Manual dispatch skips changed-scope detection and makes the preflight manifest act as if every scoped area changed. -`Package Acceptance` is the side-run workflow for validating a package artifact -without blocking the release workflow. It resolves one candidate from a -published npm spec, a trusted `package_ref` built with the selected -`workflow_ref` harness, an HTTPS tarball URL with SHA-256, or a tarball artifact -from another GitHub Actions run, uploads it as `package-under-test`, then reuses -the Docker release/E2E scheduler with that tarball instead of repacking the -workflow checkout. Profiles cover smoke, package, product, full, and custom -Docker lane selections. The `package` profile uses offline plugin coverage so -published-package validation is not gated on live ClawHub availability. The -optional Telegram lane reuses the -`package-under-test` artifact in the `NPM Telegram Beta E2E` workflow, with the -published npm spec path kept for standalone dispatches. +- **CI workflow edits** validate the Node CI graph plus workflow linting, but do not force Windows, Android, or macOS native builds by themselves; those platform lanes stay scoped to platform source changes. +- **CI routing-only edits, selected cheap core-test fixture edits, and narrow plugin contract helper/test-routing edits** use a fast Node-only manifest path: `preflight`, security, and a single `checks-fast-core` task. That path skips build artifacts, Node 22 compatibility, channel contracts, full core shards, bundled-plugin shards, and additional guard matrices when the change is limited to the routing or helper surfaces the fast task exercises directly. +- **Windows Node checks** are scoped to Windows-specific process/path wrappers, npm/pnpm/UI runner helpers, package manager config, and the CI workflow surfaces that execute that lane; unrelated source, plugin, install-smoke, and test-only changes stay on the Linux Node lanes. -## Package acceptance +The slowest Node test families are split or balanced so each job stays small without over-reserving runners: channel contracts run as three weighted shards, small core unit lanes are paired, auto-reply runs as four balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/plugin configs are spread across the existing source-only agentic Node jobs instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard shard runs its small independent guards concurrently inside one job. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built. -Use `Package Acceptance` when the question is "does this installable OpenClaw -package work as a product?" It is different from normal CI: normal CI validates -the source tree, while package acceptance validates a single tarball through the -same Docker E2E harness users exercise after install or update. +Android CI runs both `testPlayDebugUnitTest` and `testThirdPartyDebugUnitTest` and then builds the Play debug APK. The third-party flavor has no separate source set or manifest; its unit-test lane still compiles the flavor with the SMS/call-log BuildConfig flags, while avoiding a duplicate debug APK packaging job on every Android-relevant push. -The workflow has four jobs: +The `check-dependencies` shard runs `pnpm deadcode:dependencies` (a production Knip dependency-only pass pinned to the latest Knip version, with pnpm's minimum release age disabled for the `dlx` install) and `pnpm deadcode:unused-files`, which compares Knip's production unused-file findings against `scripts/deadcode-unused-files.allowlist.mjs`. The unused-file guard fails when a PR adds a new unreviewed unused file or leaves a stale allowlist entry, while preserving intentional dynamic plugin, generated, build, live-test, and package bridge surfaces that Knip cannot resolve statically. -1. `resolve_package` checks out `workflow_ref`, resolves one package candidate, - writes `.artifacts/docker-e2e-package/openclaw-current.tgz`, writes - `.artifacts/docker-e2e-package/package-candidate.json`, uploads both as the - `package-under-test` artifact, and prints the source, workflow ref, package - ref, version, SHA-256, and profile in the GitHub step summary. -2. `docker_acceptance` calls - `openclaw-live-and-e2e-checks-reusable.yml` with `ref=workflow_ref` and - `package_artifact_name=package-under-test`. The reusable workflow downloads - that artifact, validates the tarball inventory, prepares package-digest - Docker images when needed, and runs the selected Docker lanes against that - package instead of packing the workflow checkout. When a profile selects - multiple targeted `docker_lanes`, the reusable workflow prepares the package - and shared images once, then fans those lanes out as parallel targeted Docker - jobs with unique artifacts. -3. `package_telegram` optionally calls `NPM Telegram Beta E2E`. It runs when - `telegram_mode` is not `none` and installs the same `package-under-test` - artifact when Package Acceptance resolved one; standalone Telegram dispatch - can still install a published npm spec. -4. `summary` fails the workflow if package resolution, Docker acceptance, or - the optional Telegram lane failed. +## Manual dispatches -Candidate sources: +Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, channel contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with `include_android=true`; the full release umbrella enables Android by passing `include_android=true`. Plugin prerelease static checks, the release-only `agentic-plugins` shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when `Full Release Validation` dispatches the separate `Plugin Prerelease` workflow with the release-validation gate enabled. -- `source=npm`: accepts only `openclaw@beta`, `openclaw@latest`, or an exact - OpenClaw release version such as `openclaw@2026.4.27-beta.2`. Use this for - published beta/stable acceptance. -- `source=ref`: packs a trusted `package_ref` branch, tag, or full commit SHA. - The resolver fetches OpenClaw branches/tags, verifies the selected commit is - reachable from repository branch history or a release tag, installs deps in a - detached worktree, and packs it with `scripts/package-openclaw-for-docker.mjs`. -- `source=url`: downloads an HTTPS `.tgz`; `package_sha256` is required. -- `source=artifact`: downloads one `.tgz` from `artifact_run_id` and - `artifact_name`; `package_sha256` is optional but should be supplied for - externally shared artifacts. +Manual runs use a unique concurrency group so a release-candidate full suite is not cancelled by another push or PR run on the same ref. The optional `target_ref` input lets a trusted caller run that graph against a branch, tag, or full commit SHA while using the workflow file from the selected dispatch ref. -Keep `workflow_ref` and `package_ref` separate. `workflow_ref` is the trusted -workflow/harness code that runs the test. `package_ref` is the source commit -that gets packed when `source=ref`. This lets the current test harness validate -older trusted source commits without running old workflow logic. +```bash +gh workflow run ci.yml --ref release/YYYY.M.D +gh workflow run ci.yml --ref main -f target_ref= -f include_android=true +gh workflow run full-release-validation.yml --ref main -f ref= +``` -Profiles map to Docker coverage: +## Runners -- `smoke`: `npm-onboard-channel-agent`, `gateway-network`, `config-reload` -- `package`: `npm-onboard-channel-agent`, `doctor-switch`, - `update-channel-switch`, `bundled-channel-deps-compat`, `plugins-offline`, - `plugin-update` -- `product`: `package` plus `mcp-channels`, `cron-mcp-cleanup`, - `openai-web-search-minimal`, `openwebui` -- `full`: full Docker release-path chunks with OpenWebUI -- `custom`: exact `docker_lanes`; required when `suite_profile=custom` +| Runner | Jobs | +| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `ubuntu-24.04` | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, sharded channel contract checks, `check` shards except lint, `check-additional` shards and aggregates, Node test aggregate verifiers, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier | +| `blacksmith-4vcpu-ubuntu-2404` | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types` | +| `blacksmith-8vcpu-ubuntu-2404` | `build-artifacts`, build-smoke, Linux Node test shards, bundled plugin test shards, `android` | +| `blacksmith-16vcpu-ubuntu-2404` | `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved) | +| `blacksmith-16vcpu-windows-2025` | `checks-windows` | +| `blacksmith-6vcpu-macos-latest` | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest` | +| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest` | -Release checks call Package Acceptance with `source=ref`, -`package_ref=`, `workflow_ref=`, -`suite_profile=custom`, -`docker_lanes='bundled-channel-deps-compat plugins-offline'`, and -`telegram_mode=mock-openai`. The release-path Docker -chunks cover the overlapping package/update/plugin lanes, while Package -Acceptance keeps the artifact-native bundled-channel compat, offline plugin, and -Telegram proof against the same resolved package tarball. -Cross-OS release checks still cover OS-specific onboarding, installer, and -platform behavior; package/update product validation should start with Package -Acceptance. The Windows packaged and installer fresh lanes also verify that an -installed package can import a browser-control override from a raw absolute -Windows path. The OpenAI cross-OS agent-turn smoke defaults to -`OPENCLAW_CROSS_OS_OPENAI_MODEL` when set, otherwise `openai/gpt-5.4-mini`, so -the install and gateway proof stays fast and deterministic. Dedicated live -provider/model lanes still cover broader model routing, including slower -frontier defaults. +## Local equivalents -Package Acceptance has bounded legacy-compatibility windows for already -published packages. Packages through `2026.4.25`, including `2026.4.25-beta.*`, -may use the compatibility path for known private QA entries in -`dist/postinstall-inventory.json` that point at tarball-omitted files, -`doctor-switch` may skip the `gateway install --wrapper` persistence subcase -when the package does not expose that flag, `update-channel-switch` may prune -missing `pnpm.patchedDependencies` from the tarball-derived fake git fixture and -may log missing persisted `update.channel`, plugin smokes may read legacy -install-record locations or accept missing marketplace install-record -persistence, and `plugin-update` may allow config metadata migration while still -requiring the install record and no-reinstall behavior to stay unchanged. The -published `2026.4.26` package may also warn for local build metadata stamp files -that were already shipped. Later packages must satisfy the modern contracts; the -same conditions fail instead of warn or skip. +```bash +pnpm changed:lanes # inspect the local changed-lane classifier for origin/main...HEAD +pnpm check:changed # smart local check gate: changed typecheck/lint/guards by boundary lane +pnpm check # fast local gate: prod tsgo + sharded lint + parallel fast guards +pnpm check:test-types +pnpm check:timed # same gate with per-stage timings +pnpm build:strict-smoke +pnpm check:architecture +pnpm test:gateway:watch-regression +pnpm test # vitest tests +pnpm test:changed # cheap smart changed Vitest targets +pnpm test:channels +pnpm test:contracts:channels +pnpm check:docs # docs format + lint + broken links +pnpm build # build dist when CI artifact/build-smoke lanes matter +pnpm ci:timings # summarize the latest origin/main push CI run +pnpm ci:timings:recent # compare recent successful main CI runs +node scripts/ci-run-timings.mjs # summarize wall time, queue time, and slowest jobs +node scripts/ci-run-timings.mjs --latest-main # ignore issue/comment noise and choose origin/main push CI +node scripts/ci-run-timings.mjs --recent 10 # compare recent successful main CI runs +pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/baseline-before.json +pnpm test:perf:groups:compare .artifacts/test-perf/baseline-before.json .artifacts/test-perf/after-agent.json +``` -Examples: +## Full Release Validation + +`Full Release Validation` is the manual umbrella workflow for "run everything before release." It accepts a branch, tag, or full commit SHA, dispatches the manual `CI` workflow with that target, dispatches `Plugin Prerelease` for release-only plugin/package/static/Docker proof, and dispatches `OpenClaw Release Checks` for install smoke, package acceptance, Docker release-path suites, live/E2E, OpenWebUI, QA Lab parity, Matrix, and Telegram lanes. It can also run the post-publish `NPM Telegram Beta E2E` workflow when a published package spec is provided. + +`release_profile` controls live/provider breadth passed into release checks: + +- `minimum` keeps the fastest OpenAI/core release-critical lanes. +- `stable` adds the stable provider/backend set. +- `full` runs the broad advisory provider/media matrix. + +The umbrella records the dispatched child run ids, and the final `Verify full validation` job re-checks current child run conclusions and appends slowest-job tables for each child run. If a child workflow is rerun and turns green, rerun only the parent verifier job to refresh the umbrella result and timing summary. + +For recovery, both `Full Release Validation` and `OpenClaw Release Checks` accept `rerun_group`. Use `all` for a release candidate, `ci` for only the normal full CI child, `release-checks` for every release child, or a narrower group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, `qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed release box rerun bounded after a focused fix. + +`OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected ref once into a `release-package-under-test` tarball, then passes that artifact to both the live/E2E release-path Docker workflow and the package acceptance shard. That keeps the package bytes consistent across release boxes and avoids repacking the same candidate in multiple child jobs. + +## Live and E2E shards + +The release live/E2E child keeps broad native `pnpm test:live` coverage, but it runs it as named shards through `scripts/test-live-shard.mjs` instead of one serial job: + +- `native-live-src-agents` +- `native-live-src-gateway-core` +- provider-filtered `native-live-src-gateway-profiles` jobs +- `native-live-src-gateway-backends` +- `native-live-test` +- `native-live-extensions-a-k` +- `native-live-extensions-l-n` +- `native-live-extensions-openai` +- `native-live-extensions-o-z-other` +- `native-live-extensions-xai` +- split media audio/video shards and provider-filtered music shards + +That keeps the same file coverage while making slow live provider failures easier to rerun and diagnose. The aggregate `native-live-extensions-o-z`, `native-live-extensions-media`, and `native-live-extensions-media-music` shard names remain valid for manual one-shot reruns. + +The native live media shards run in `ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04`, built by the `Live Media Runner Image` workflow. That image preinstalls `ffmpeg` and `ffprobe`; media jobs only verify the binaries before setup. Keep Docker-backed live suites on normal Blacksmith runners — container jobs are the wrong place to launch nested Docker tests. + +Docker-backed live model/backend shards use a separate shared `ghcr.io/openclaw/openclaw-live-test:` image per selected commit. The live release workflow builds and pushes that image once, then the Docker live model, gateway, CLI backend, ACP bind, and Codex harness shards run with `OPENCLAW_SKIP_DOCKER_BUILD=1`. If those shards rebuild the full source Docker target independently, the release run is misconfigured and will waste wall clock on duplicate image builds. + +## Package Acceptance + +Use `Package Acceptance` when the question is "does this installable OpenClaw package work as a product?" It is different from normal CI: normal CI validates the source tree, while package acceptance validates a single tarball through the same Docker E2E harness users exercise after install or update. + +### Jobs + +1. `resolve_package` checks out `workflow_ref`, resolves one package candidate, writes `.artifacts/docker-e2e-package/openclaw-current.tgz`, writes `.artifacts/docker-e2e-package/package-candidate.json`, uploads both as the `package-under-test` artifact, and prints the source, workflow ref, package ref, version, SHA-256, and profile in the GitHub step summary. +2. `docker_acceptance` calls `openclaw-live-and-e2e-checks-reusable.yml` with `ref=workflow_ref` and `package_artifact_name=package-under-test`. The reusable workflow downloads that artifact, validates the tarball inventory, prepares package-digest Docker images when needed, and runs the selected Docker lanes against that package instead of packing the workflow checkout. When a profile selects multiple targeted `docker_lanes`, the reusable workflow prepares the package and shared images once, then fans those lanes out as parallel targeted Docker jobs with unique artifacts. +3. `package_telegram` optionally calls `NPM Telegram Beta E2E`. It runs when `telegram_mode` is not `none` and installs the same `package-under-test` artifact when Package Acceptance resolved one; standalone Telegram dispatch can still install a published npm spec. +4. `summary` fails the workflow if package resolution, Docker acceptance, or the optional Telegram lane failed. + +### Candidate sources + +- `source=npm` accepts only `openclaw@beta`, `openclaw@latest`, or an exact OpenClaw release version such as `openclaw@2026.4.27-beta.2`. Use this for published beta/stable acceptance. +- `source=ref` packs a trusted `package_ref` branch, tag, or full commit SHA. The resolver fetches OpenClaw branches/tags, verifies the selected commit is reachable from repository branch history or a release tag, installs deps in a detached worktree, and packs it with `scripts/package-openclaw-for-docker.mjs`. +- `source=url` downloads an HTTPS `.tgz`; `package_sha256` is required. +- `source=artifact` downloads one `.tgz` from `artifact_run_id` and `artifact_name`; `package_sha256` is optional but should be supplied for externally shared artifacts. + +Keep `workflow_ref` and `package_ref` separate. `workflow_ref` is the trusted workflow/harness code that runs the test. `package_ref` is the source commit that gets packed when `source=ref`. This lets the current test harness validate older trusted source commits without running old workflow logic. + +### Suite profiles + +- `smoke` — `npm-onboard-channel-agent`, `gateway-network`, `config-reload` +- `package` — `npm-onboard-channel-agent`, `doctor-switch`, `update-channel-switch`, `bundled-channel-deps-compat`, `plugins-offline`, `plugin-update` +- `product` — `package` plus `mcp-channels`, `cron-mcp-cleanup`, `openai-web-search-minimal`, `openwebui` +- `full` — full Docker release-path chunks with OpenWebUI +- `custom` — exact `docker_lanes`; required when `suite_profile=custom` + +The `package` profile uses offline plugin coverage so published-package validation is not gated on live ClawHub availability. The optional Telegram lane reuses the `package-under-test` artifact in `NPM Telegram Beta E2E`, with the published npm spec path kept for standalone dispatches. + +Release checks call Package Acceptance with `source=ref`, `package_ref=`, `workflow_ref=`, `suite_profile=custom`, `docker_lanes='bundled-channel-deps-compat plugins-offline'`, and `telegram_mode=mock-openai`. Release-path Docker chunks cover the overlapping package/update/plugin lanes; Package Acceptance keeps the artifact-native bundled-channel compat, offline plugin, and Telegram proof against the same resolved package tarball. Cross-OS release checks still cover OS-specific onboarding, installer, and platform behavior; package/update product validation should start with Package Acceptance. The Windows packaged and installer fresh lanes also verify that an installed package can import a browser-control override from a raw absolute Windows path. The OpenAI cross-OS agent-turn smoke defaults to `OPENCLAW_CROSS_OS_OPENAI_MODEL` when set, otherwise `openai/gpt-5.4-mini`, so the install and gateway proof stays fast and deterministic. + +### Legacy compatibility windows + +Package Acceptance has bounded legacy-compatibility windows for already-published packages. Packages through `2026.4.25`, including `2026.4.25-beta.*`, may use the compatibility path: + +- known private QA entries in `dist/postinstall-inventory.json` may point at tarball-omitted files; +- `doctor-switch` may skip the `gateway install --wrapper` persistence subcase when the package does not expose that flag; +- `update-channel-switch` may prune missing `pnpm.patchedDependencies` from the tarball-derived fake git fixture and may log missing persisted `update.channel`; +- plugin smokes may read legacy install-record locations or accept missing marketplace install-record persistence; +- `plugin-update` may allow config metadata migration while still requiring the install record and no-reinstall behavior to stay unchanged. + +The published `2026.4.26` package may also warn for local build metadata stamp files that were already shipped. Later packages must satisfy the modern contracts; the same conditions fail instead of warn or skip. + +### Examples ```bash # Validate the current beta package with product-level coverage. @@ -216,169 +231,152 @@ gh workflow run package-acceptance.yml \ -f docker_lanes='install-e2e plugin-update' ``` -When debugging a failed package acceptance run, start at the `resolve_package` -summary to confirm the package source, version, and SHA-256. Then inspect the -`docker_acceptance` child run and its Docker artifacts: -`.artifacts/docker-tests/**/summary.json`, `failures.json`, lane logs, phase -timings, and rerun commands. Prefer rerunning the failed package profile or -exact Docker lanes instead of rerunning full release validation. +When debugging a failed package acceptance run, start at the `resolve_package` summary to confirm the package source, version, and SHA-256. Then inspect the `docker_acceptance` child run and its Docker artifacts: `.artifacts/docker-tests/**/summary.json`, `failures.json`, lane logs, phase timings, and rerun commands. Prefer rerunning the failed package profile or exact Docker lanes instead of rerunning full release validation. -QA Lab has dedicated CI lanes outside the main smart-scoped workflow. The -`Parity gate` workflow runs on matching PR changes and manual dispatch; it -builds the private QA runtime and compares the mock GPT-5.5 and Opus 4.6 -agentic packs. The `QA-Lab - All Lanes` workflow runs nightly on `main` and on -manual dispatch; it fans out the mock parity gate, live Matrix lane, and live -Telegram and Discord lanes as parallel jobs. The live jobs use the -`qa-live-shared` environment, and Telegram/Discord use Convex leases. Release -checks run Matrix and Telegram live transport lanes with the deterministic mock -provider and mock-qualified models (`mock-openai/gpt-5.5` and -`mock-openai/gpt-5.5-alt`) so the channel contract is isolated from live model -latency and normal provider-plugin startup. The live transport gateway also -disables memory search because QA parity covers memory behavior separately; -provider connectivity is covered by the separate live model, native provider, -and Docker provider suites. Matrix uses `--profile fast` for scheduled and release gates, -adding `--fail-fast` only when the checked-out CLI supports it. The CLI default -and manual workflow input remain `all`; manual `matrix_profile=all` -dispatch always shards full Matrix coverage into `transport`, `media`, -`e2ee-smoke`, `e2ee-deep`, and `e2ee-cli` jobs. `OpenClaw Release Checks` also -runs the release-critical QA Lab lanes before release approval; its QA parity -gate runs the candidate and baseline packs as parallel lane jobs, then downloads -both artifacts into a small report job for the final parity comparison. -Do not put the PR landing path behind `Parity gate` unless the change actually -touches QA runtime, model-pack parity, or a surface the parity workflow owns. -For normal channel, config, docs, or unit-test fixes, treat it as an optional -signal and follow the scoped CI/check evidence instead. +## Install smoke -The `Duplicate PRs After Merge` workflow is a manual maintainer workflow for -post-land duplicate cleanup. It defaults to dry-run and only closes explicitly -listed PRs when `apply=true`. Before mutating GitHub, it verifies that the -landed PR is merged and that each duplicate has either a shared referenced issue -or overlapping changed hunks. +The separate `Install Smoke` workflow reuses the same scope script through its own `preflight` job. It splits smoke coverage into `run_fast_install_smoke` and `run_full_install_smoke`. -The `CodeQL` workflow is intentionally a narrow first-pass security scanner, -not the full repository sweep. Daily, manual, and non-draft pull request guard -runs scan Actions workflow code plus the highest-risk JavaScript/TypeScript -auth, secrets, sandbox, cron, and gateway surfaces with high-confidence security -queries filtered to high/critical `security-severity` under the -`/codeql-security-high/core-auth-secrets` category. The -channel-runtime-boundary job separately scans core channel implementation -contracts plus the channel plugin runtime, gateway, Plugin SDK, secrets, and -audit touchpoints under the `/codeql-security-high/channel-runtime-boundary` -category so channel security signal can scale without broadening the baseline -auth/secrets category. The network-ssrf-boundary job scans core SSRF, IP parsing, -network guard, web-fetch, and Plugin SDK SSRF policy surfaces under the -`/codeql-security-high/network-ssrf-boundary` category so network trust boundary -signal stays separate from the auth/secrets security baseline. -The mcp-process-tool-boundary job scans MCP servers, process execution helpers, -outbound delivery, and agent tool-execution gates under the -`/codeql-security-high/mcp-process-tool-boundary` category so command and tool -boundary signal stays separate from both the auth/secrets baseline and the -non-security MCP/process quality shard. The plugin-trust-boundary job scans -plugin install, loader, manifest, registry, runtime-dependency staging, -source-loading, public-surface, and Plugin SDK package contract trust surfaces -under the `/codeql-security-high/plugin-trust-boundary` category so plugin -supply-chain and runtime-loading signal stays separate from both bundled plugin -implementation code and the non-security plugin quality shard. -The pull request guard stays light: it only starts for changes under -`.github/actions`, `.github/codeql`, `.github/workflows`, `packages`, or `src`, -and it runs the same high-confidence security matrix as the scheduled workflow. -Android and macOS CodeQL stay out of PR defaults. +- **Fast path** runs for pull requests touching Docker/package surfaces, bundled plugin package/manifest changes, or core plugin/channel/gateway/Plugin SDK surfaces that the Docker smoke jobs exercise. Source-only bundled plugin changes, test-only edits, and docs-only edits do not reserve Docker workers. The fast path builds the root Dockerfile image once, checks the CLI, runs the agents delete shared-workspace CLI smoke, runs the container gateway-network e2e, verifies a bundled extension build arg, and runs the bounded bundled-plugin Docker profile under a 240-second aggregate command timeout (each scenario's Docker run capped separately). +- **Full path** keeps QR package install and installer Docker/update coverage for nightly scheduled runs, manual dispatches, workflow-call release checks, and pull requests that truly touch installer/package/Docker surfaces. In full mode, install-smoke prepares or reuses one target-SHA GHCR root Dockerfile smoke image, then runs QR package install, root Dockerfile/gateway smokes, installer/update smokes, and the fast bundled-plugin Docker E2E as separate jobs so installer work does not wait behind the root image smokes. -The `CodeQL Android Critical Security` workflow is the scheduled Android -security shard. It builds the Android app manually for CodeQL on the smallest -Blacksmith Linux runner label accepted by workflow sanity and uploads results -under the `/codeql-critical-security/android` category. +`main` pushes (including merge commits) do not force the full path; when changed-scope logic would request full coverage on a push, the workflow keeps the fast Docker smoke and leaves the full install smoke to nightly or release validation. -The `CodeQL macOS Critical Security` workflow is the weekly/manual macOS -security shard. It builds the macOS app manually for CodeQL on Blacksmith macOS, -filters dependency build results out of the uploaded SARIF, and uploads results -under the `/codeql-critical-security/macos` category. Keep it outside the daily -default workflow because the macOS build dominates runtime even when clean. +The slow Bun global install image-provider smoke is separately gated by `run_bun_global_install_smoke`. It runs on the nightly schedule and from the release checks workflow, and manual `Install Smoke` dispatches can opt into it, but pull requests and `main` pushes do not. QR and installer Docker tests keep their own install-focused Dockerfiles. -The `CodeQL Critical Quality` workflow is the matching non-security shard. It -runs only error-severity, non-security JavaScript/TypeScript quality queries -over narrow high-value surfaces on the smaller Blacksmith Linux runner. Its -pull request guard is intentionally smaller than the scheduled profile: non-draft -PRs only run the matching `channel-runtime-boundary`, -`gateway-runtime-boundary`, `provider-runtime-boundary`, `plugin-boundary`, and -`plugin-sdk-package-contract` shards for channel runtime, gateway -protocol/server-method, provider runtime/model catalog, plugin loader, Plugin -SDK, or package-contract changes. CodeQL config and quality workflow changes run -all five PR quality shards. Its manual dispatch accepts -`profile=all|channel-runtime-boundary|gateway-runtime-boundary|plugin-boundary|plugin-sdk-package-contract|plugin-sdk-reply-runtime|provider-runtime-boundary|session-diagnostics-boundary`; -the narrow profiles are teaching/iteration hooks for running one quality shard -in isolation without dispatching the rest of the workflow. -Its -core-auth-secrets job scans auth, secrets, sandbox, cron, and gateway security -boundary code under the separate `/codeql-critical-quality/core-auth-secrets` -category. The config-boundary -job scans config schema, migration, normalization, and IO contracts under the -separate `/codeql-critical-quality/config-boundary` category. The -gateway-runtime-boundary job scans gateway protocol schemas and server method -contracts under the separate -`/codeql-critical-quality/gateway-runtime-boundary` category. The -channel-runtime-boundary job scans core channel implementation contracts under -the separate `/codeql-critical-quality/channel-runtime-boundary` category. The -agent-runtime-boundary job scans command execution, model/provider dispatch, -auto-reply dispatch and queues, and ACP control-plane runtime contracts under -the separate `/codeql-critical-quality/agent-runtime-boundary` category. The -mcp-process-runtime-boundary job scans MCP servers and tool bridges, process -supervision helpers, and outbound delivery contracts under the separate -`/codeql-critical-quality/mcp-process-runtime-boundary` category. The -memory-runtime-boundary job scans the memory host SDK, memory runtime facades, -memory Plugin SDK aliases, memory runtime activation glue, and memory doctor -commands under the separate `/codeql-critical-quality/memory-runtime-boundary` -category. The session-diagnostics-boundary job scans reply queue internals, -session delivery queues, outbound session binding/delivery helpers, diagnostic -event/log bundle surfaces, and session doctor CLI contracts under the separate -`/codeql-critical-quality/session-diagnostics-boundary` category. The -plugin-sdk-reply-runtime job scans Plugin SDK inbound reply dispatch, reply -payload/chunking/runtime helpers, channel reply options, delivery queues, and -session/thread binding helpers under the separate -`/codeql-critical-quality/plugin-sdk-reply-runtime` category. The -provider-runtime-boundary job scans model catalog normalization, provider auth -and discovery, provider runtime registration, provider defaults/catalogs, and -web/search/fetch/embedding provider registries under the separate -`/codeql-critical-quality/provider-runtime-boundary` category. The -ui-control-plane job scans Control UI bootstrap, local persistence, gateway -control flows, and task control-plane runtime contracts under the separate -`/codeql-critical-quality/ui-control-plane` category. The -web-media-runtime-boundary job scans core web fetch/search, media IO, media -understanding, image-generation, and media-generation runtime contracts under -the separate `/codeql-critical-quality/web-media-runtime-boundary` category. The -plugin-boundary job scans loader, registry, public-surface, and Plugin SDK -entrypoint contracts under a separate `/codeql-critical-quality/plugin-boundary` -category. The plugin-sdk-package-contract job scans the published package-side -Plugin SDK source and plugin package contract helpers under the separate -`/codeql-critical-quality/plugin-sdk-package-contract` category. Keep the -workflow separate from security so quality findings can be -scheduled, measured, disabled, or expanded without obscuring security signal. -Swift, Python, and bundled-plugin CodeQL expansion should be added back as -scoped or sharded follow-up work only after the narrow profiles have stable -runtime and signal. +## Local Docker E2E -The `Docs Agent` workflow is an event-driven Codex maintenance lane for keeping -existing docs aligned with recently landed changes. It has no pure schedule: a -successful non-bot push CI run on `main` can trigger it, and manual dispatch can -run it directly. Workflow-run invocations skip when `main` has moved on or when -another non-skipped Docs Agent run was created in the last hour. When it runs, it -reviews the commit range from the previous non-skipped Docs Agent source SHA to -current `main`, so one hourly run can cover all main changes accumulated since -the last docs pass. +`pnpm test:docker:all` prebuilds one shared live-test image, packs OpenClaw once as an npm tarball, and builds two shared `scripts/e2e/Dockerfile` images: -The `Test Performance Agent` workflow is an event-driven Codex maintenance lane -for slow tests. It has no pure schedule: a successful non-bot push CI run on -`main` can trigger it, but it skips if another workflow-run invocation already -ran or is running that UTC day. Manual dispatch bypasses that daily activity -gate. The lane builds a full-suite grouped Vitest performance report, lets Codex -make only small coverage-preserving test performance fixes instead of broad -refactors, then reruns the full-suite report and rejects changes that reduce the -passing baseline test count. If the baseline has failing tests, Codex may fix -only obvious failures and the after-agent full-suite report must pass before -anything is committed. When `main` advances before the bot push lands, the lane -rebases the validated patch, reruns `pnpm check:changed`, and retries the push; -conflicting stale patches are skipped. It uses GitHub-hosted Ubuntu so the Codex -action can keep the same drop-sudo safety posture as the docs agent. +- a bare Node/Git runner for installer/update/plugin-dependency lanes; +- a functional image that installs the same tarball into `/app` for normal functionality lanes. + +Docker lane definitions live in `scripts/lib/docker-e2e-scenarios.mjs`, planner logic lives in `scripts/lib/docker-e2e-plan.mjs`, and the runner only executes the selected plan. The scheduler selects the image per lane with `OPENCLAW_DOCKER_E2E_BARE_IMAGE` and `OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE`, then runs lanes with `OPENCLAW_SKIP_DOCKER_BUILD=1`. + +### Tunables + +| Variable | Default | Purpose | +| -------------------------------------- | ------- | --------------------------------------------------------------------------------------------- | +| `OPENCLAW_DOCKER_ALL_PARALLELISM` | 10 | Main-pool slot count for normal lanes. | +| `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM` | 10 | Provider-sensitive tail-pool slot count. | +| `OPENCLAW_DOCKER_ALL_LIVE_LIMIT` | 9 | Concurrent live lane cap so providers do not throttle. | +| `OPENCLAW_DOCKER_ALL_NPM_LIMIT` | 10 | Concurrent npm install lane cap. | +| `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT` | 7 | Concurrent multi-service lane cap. | +| `OPENCLAW_DOCKER_ALL_START_STAGGER_MS` | 2000 | Stagger between lane starts to avoid Docker daemon create storms; set `0` for no stagger. | +| `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS` | 7200000 | Per-lane fallback timeout (120 minutes); selected live/tail lanes use tighter caps. | +| `OPENCLAW_DOCKER_ALL_DRY_RUN` | unset | `1` prints the scheduler plan without running lanes. | +| `OPENCLAW_DOCKER_ALL_LANES` | unset | Comma-separated exact lane list; skips cleanup smoke so agents can reproduce one failed lane. | + +A lane heavier than its effective cap can still start from an empty pool, then runs alone until it releases capacity. The local aggregate preflights Docker, removes stale OpenClaw E2E containers, emits active-lane status, persists lane timings for longest-first ordering, and stops scheduling new pooled lanes after the first failure by default. + +### Reusable live/E2E workflow + +The reusable live/E2E workflow asks `scripts/test-docker-all.mjs --plan-json` which package, image kind, live image, lane, and credential coverage is required. `scripts/docker-e2e.mjs` then converts that plan into GitHub outputs and summaries. It either packs OpenClaw through `scripts/package-openclaw-for-docker.mjs`, downloads a current-run package artifact, or downloads a package artifact from `package_artifact_run_id`; validates the tarball inventory; builds and pushes package-digest-tagged bare/functional GHCR Docker E2E images through Blacksmith's Docker layer cache when the plan needs package-installed lanes; and reuses provided `docker_e2e_bare_image`/`docker_e2e_functional_image` inputs or existing package-digest images instead of rebuilding. Docker image pulls are retried with a bounded 180-second per-attempt timeout so a stuck registry/cache stream retries quickly instead of consuming most of the CI critical path. + +### Release-path chunks + +Release Docker coverage runs smaller chunked jobs with `OPENCLAW_SKIP_DOCKER_BUILD=1` so each chunk pulls only the image kind it needs and executes multiple lanes through the same weighted scheduler: + +- `OPENCLAW_DOCKER_ALL_PROFILE=release-path` +- `OPENCLAW_DOCKER_ALL_CHUNK=core | package-update-openai | package-update-anthropic | package-update-core | plugins-runtime-plugins | plugins-runtime-services | plugins-runtime-install-a..h | bundled-channels` + +Current release Docker chunks are `core`, `package-update-openai`, `package-update-anthropic`, `package-update-core`, `plugins-runtime-plugins`, `plugins-runtime-services`, `plugins-runtime-install-a` through `plugins-runtime-install-h`, `bundled-channels-core`, `bundled-channels-update-a`, `bundled-channels-update-discord`, `bundled-channels-update-b`, and `bundled-channels-contracts`. The aggregate `bundled-channels` chunk remains available for manual one-shot reruns, and `plugins-runtime-core`, `plugins-runtime`, and `plugins-integrations` remain aggregate plugin/runtime aliases. The `install-e2e` lane alias remains the aggregate manual rerun alias for both provider installer lanes. The `bundled-channels` chunk runs split `bundled-channel-*` and `bundled-channel-update-*` lanes rather than the serial all-in-one `bundled-channel-deps` lane. + +OpenWebUI is folded into `plugins-runtime-services` when full release-path coverage requests it, and keeps a standalone `openwebui` chunk only for OpenWebUI-only dispatches. Bundled-channel update lanes retry once for transient npm network failures. + +Each chunk uploads `.artifacts/docker-tests/` with lane logs, timings, `summary.json`, `failures.json`, phase timings, scheduler plan JSON, slow-lane tables, and per-lane rerun commands. The workflow `docker_lanes` input runs selected lanes against the prepared images instead of the chunk jobs, which keeps failed-lane debugging bounded to one targeted Docker job and prepares, downloads, or reuses the package artifact for that run; if a selected lane is a live Docker lane, the targeted job builds the live-test image locally for that rerun. Generated per-lane GitHub rerun commands include `package_artifact_run_id`, `package_artifact_name`, and prepared image inputs when those values exist, so a failed lane can reuse the exact package and images from the failed run. + +```bash +pnpm test:docker:rerun # download Docker artifacts and print combined/per-lane targeted rerun commands +pnpm test:docker:timings # slow-lane and phase critical-path summaries +``` + +The scheduled live/E2E workflow runs the full release-path Docker suite daily. + +## Plugin Prerelease + +`Plugin Prerelease` is more expensive product/package coverage, so it is a separate workflow dispatched by `Full Release Validation` or by an explicit operator. Normal pull requests, `main` pushes, and standalone manual CI dispatches keep that suite off. It balances bundled plugin tests across eight extension workers; those extension shard jobs run up to two plugin config groups at a time with one Vitest worker per group and a larger Node heap so import-heavy plugin batches do not create extra CI jobs. + +## QA Lab + +QA Lab has dedicated CI lanes outside the main smart-scoped workflow. + +- The `Parity gate` workflow runs on matching PR changes and manual dispatch; it builds the private QA runtime and compares the mock GPT-5.5 and Opus 4.6 agentic packs. +- The `QA-Lab - All Lanes` workflow runs nightly on `main` and on manual dispatch; it fans out the mock parity gate, live Matrix lane, and live Telegram and Discord lanes as parallel jobs. Live jobs use the `qa-live-shared` environment, and Telegram/Discord use Convex leases. + +Release checks run Matrix and Telegram live transport lanes with the deterministic mock provider and mock-qualified models (`mock-openai/gpt-5.5` and `mock-openai/gpt-5.5-alt`) so the channel contract is isolated from live model latency and normal provider-plugin startup. The live transport gateway disables memory search because QA parity covers memory behavior separately; provider connectivity is covered by the separate live model, native provider, and Docker provider suites. + +Matrix uses `--profile fast` for scheduled and release gates, adding `--fail-fast` only when the checked-out CLI supports it. The CLI default and manual workflow input remain `all`; manual `matrix_profile=all` dispatch always shards full Matrix coverage into `transport`, `media`, `e2ee-smoke`, `e2ee-deep`, and `e2ee-cli` jobs. + +`OpenClaw Release Checks` also runs the release-critical QA Lab lanes before release approval; its QA parity gate runs the candidate and baseline packs as parallel lane jobs, then downloads both artifacts into a small report job for the final parity comparison. + +Do not put the PR landing path behind `Parity gate` unless the change actually touches QA runtime, model-pack parity, or a surface the parity workflow owns. For normal channel, config, docs, or unit-test fixes, treat it as an optional signal and follow the scoped CI/check evidence instead. + +## CodeQL + +The `CodeQL` workflow is intentionally a narrow first-pass security scanner, not the full repository sweep. Daily, manual, and non-draft pull request guard runs scan Actions workflow code plus the highest-risk JavaScript/TypeScript surfaces with high-confidence security queries filtered to high/critical `security-severity`. + +The pull request guard stays light: it only starts for changes under `.github/actions`, `.github/codeql`, `.github/workflows`, `packages`, or `src`, and it runs the same high-confidence security matrix as the scheduled workflow. Android and macOS CodeQL stay out of PR defaults. + +### Security categories + +| Category | Surface | +| ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `/codeql-security-high/core-auth-secrets` | Auth, secrets, sandbox, cron, and gateway baseline | +| `/codeql-security-high/channel-runtime-boundary` | Core channel implementation contracts plus the channel plugin runtime, gateway, Plugin SDK, secrets, audit touchpoints | +| `/codeql-security-high/network-ssrf-boundary` | Core SSRF, IP parsing, network guard, web-fetch, and Plugin SDK SSRF policy surfaces | +| `/codeql-security-high/mcp-process-tool-boundary` | MCP servers, process execution helpers, outbound delivery, and agent tool-execution gates | +| `/codeql-security-high/plugin-trust-boundary` | Plugin install, loader, manifest, registry, runtime-dependency staging, source-loading, and Plugin SDK package contract trust surfaces | + +### Platform-specific security shards + +- `CodeQL Android Critical Security` — scheduled Android security shard. Builds the Android app manually for CodeQL on the smallest Blacksmith Linux runner accepted by workflow sanity. Uploads under `/codeql-critical-security/android`. +- `CodeQL macOS Critical Security` — weekly/manual macOS security shard. Builds the macOS app manually for CodeQL on Blacksmith macOS, filters dependency build results out of uploaded SARIF, and uploads under `/codeql-critical-security/macos`. Kept outside daily defaults because macOS build dominates runtime even when clean. + +### Critical Quality categories + +`CodeQL Critical Quality` is the matching non-security shard. It runs only error-severity, non-security JavaScript/TypeScript quality queries over narrow high-value surfaces on the smaller Blacksmith Linux runner. Its pull request guard is intentionally smaller than the scheduled profile: non-draft PRs only run the matching `channel-runtime-boundary`, `gateway-runtime-boundary`, `provider-runtime-boundary`, `plugin-boundary`, and `plugin-sdk-package-contract` shards for channel runtime, gateway protocol/server-method, provider runtime/model catalog, plugin loader, Plugin SDK, or package-contract changes. CodeQL config and quality workflow changes run all five PR quality shards. + +Manual dispatch accepts: + +``` +profile=all|channel-runtime-boundary|gateway-runtime-boundary|plugin-boundary|plugin-sdk-package-contract|plugin-sdk-reply-runtime|provider-runtime-boundary|session-diagnostics-boundary +``` + +The narrow profiles are teaching/iteration hooks for running one quality shard in isolation. + +| Category | Surface | +| ------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `/codeql-critical-quality/core-auth-secrets` | Auth, secrets, sandbox, cron, and gateway security boundary code | +| `/codeql-critical-quality/config-boundary` | Config schema, migration, normalization, and IO contracts | +| `/codeql-critical-quality/gateway-runtime-boundary` | Gateway protocol schemas and server method contracts | +| `/codeql-critical-quality/channel-runtime-boundary` | Core channel implementation contracts | +| `/codeql-critical-quality/agent-runtime-boundary` | Command execution, model/provider dispatch, auto-reply dispatch and queues, and ACP control-plane runtime contracts | +| `/codeql-critical-quality/mcp-process-runtime-boundary` | MCP servers and tool bridges, process supervision helpers, and outbound delivery contracts | +| `/codeql-critical-quality/memory-runtime-boundary` | Memory host SDK, memory runtime facades, memory Plugin SDK aliases, memory runtime activation glue, and memory doctor commands | +| `/codeql-critical-quality/session-diagnostics-boundary` | Reply queue internals, session delivery queues, outbound session binding/delivery helpers, diagnostic event/log bundle surfaces, and session doctor CLI contracts | +| `/codeql-critical-quality/plugin-sdk-reply-runtime` | Plugin SDK inbound reply dispatch, reply payload/chunking/runtime helpers, channel reply options, delivery queues, and session/thread binding helpers | +| `/codeql-critical-quality/provider-runtime-boundary` | Model catalog normalization, provider auth and discovery, provider runtime registration, provider defaults/catalogs, and web/search/fetch/embedding registries | +| `/codeql-critical-quality/ui-control-plane` | Control UI bootstrap, local persistence, gateway control flows, and task control-plane runtime contracts | +| `/codeql-critical-quality/web-media-runtime-boundary` | Core web fetch/search, media IO, media understanding, image-generation, and media-generation runtime contracts | +| `/codeql-critical-quality/plugin-boundary` | Loader, registry, public-surface, and Plugin SDK entrypoint contracts | +| `/codeql-critical-quality/plugin-sdk-package-contract` | Published package-side Plugin SDK source and plugin package contract helpers | + +Quality stays separate from security so quality findings can be scheduled, measured, disabled, or expanded without obscuring security signal. Swift, Python, and bundled-plugin CodeQL expansion should be added back as scoped or sharded follow-up work only after the narrow profiles have stable runtime and signal. + +## Maintenance workflows + +### Docs Agent + +The `Docs Agent` workflow is an event-driven Codex maintenance lane for keeping existing docs aligned with recently landed changes. It has no pure schedule: a successful non-bot push CI run on `main` can trigger it, and manual dispatch can run it directly. Workflow-run invocations skip when `main` has moved on or when another non-skipped Docs Agent run was created in the last hour. When it runs, it reviews the commit range from the previous non-skipped Docs Agent source SHA to current `main`, so one hourly run can cover all main changes accumulated since the last docs pass. + +### Test Performance Agent + +The `Test Performance Agent` workflow is an event-driven Codex maintenance lane for slow tests. It has no pure schedule: a successful non-bot push CI run on `main` can trigger it, but it skips if another workflow-run invocation already ran or is running that UTC day. Manual dispatch bypasses that daily activity gate. The lane builds a full-suite grouped Vitest performance report, lets Codex make only small coverage-preserving test performance fixes instead of broad refactors, then reruns the full-suite report and rejects changes that reduce the passing baseline test count. If the baseline has failing tests, Codex may fix only obvious failures and the after-agent full-suite report must pass before anything is committed. When `main` advances before the bot push lands, the lane rebases the validated patch, reruns `pnpm check:changed`, and retries the push; conflicting stale patches are skipped. It uses GitHub-hosted Ubuntu so the Codex action can keep the same drop-sudo safety posture as the docs agent. + +### Duplicate PRs After Merge + +The `Duplicate PRs After Merge` workflow is a manual maintainer workflow for post-land duplicate cleanup. It defaults to dry-run and only closes explicitly listed PRs when `apply=true`. Before mutating GitHub, it verifies that the landed PR is merged and that each duplicate has either a shared referenced issue or overlapping changed hunks. ```bash gh workflow run duplicate-after-merge.yml \ @@ -387,143 +385,29 @@ gh workflow run duplicate-after-merge.yml \ -f apply=true ``` -## Job overview +## Local check gates and changed routing -| Job | Purpose | When it runs | -| -------------------------------- | -------------------------------------------------------------------------------------------- | ---------------------------------- | -| `preflight` | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest | Always on non-draft pushes and PRs | -| `security-scm-fast` | Private key detection and workflow audit via `zizmor` | Always on non-draft pushes and PRs | -| `security-dependency-audit` | Dependency-free production lockfile audit against npm advisories | Always on non-draft pushes and PRs | -| `security-fast` | Required aggregate for the fast security jobs | Always on non-draft pushes and PRs | -| `build-artifacts` | Build `dist/`, Control UI, built-artifact checks, and reusable downstream artifacts | Node-relevant changes | -| `checks-fast-core` | Fast Linux correctness lanes such as bundled/plugin-contract/protocol checks | Node-relevant changes | -| `checks-fast-contracts-channels` | Sharded channel contract checks with a stable aggregate check result | Node-relevant changes | -| `checks-node-core-test` | Core Node test shards, excluding channel, bundled, contract, and extension lanes | Node-relevant changes | -| `check` | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke | Node-relevant changes | -| `check-additional` | Architecture, boundary, extension-surface guards, package-boundary, and gateway-watch shards | Node-relevant changes | -| `build-smoke` | Built-CLI smoke tests and startup-memory smoke | Node-relevant changes | -| `checks` | Verifier for built-artifact channel tests | Node-relevant changes | -| `checks-node-compat-node22` | Node 22 compatibility build and smoke lane | Manual CI dispatch for releases | -| `check-docs` | Docs formatting, lint, and broken-link checks | Docs changed | -| `skills-python` | Ruff + pytest for Python-backed skills | Python-skill-relevant changes | -| `checks-windows` | Windows-specific process/path tests plus shared runtime import specifier regressions | Windows-relevant changes | -| `macos-node` | macOS TypeScript test lane using the shared built artifacts | macOS-relevant changes | -| `macos-swift` | Swift lint, build, and tests for the macOS app | macOS-relevant changes | -| `android` | Android unit tests for both flavors plus one debug APK build | Android-relevant changes | -| `test-performance-agent` | Daily Codex slow-test optimization after trusted activity | Main CI success or manual dispatch | +Local changed-lane logic lives in `scripts/changed-lanes.mjs` and is executed by `scripts/check-changed.mjs`. That local check gate is stricter about architecture boundaries than the broad CI platform scope: -Manual CI dispatches run the same job graph as normal CI but force every -non-Android scoped lane on: Linux Node shards, bundled-plugin shards, channel -contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs -checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI -dispatches run Android only with `include_android=true`; the full release -umbrella enables Android by passing `include_android=true`. Plugin prerelease -static checks, the release-only `agentic-plugins` shard, the full extension -batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker -prerelease suite runs only when `Full Release Validation` dispatches the -separate `Plugin Prerelease` workflow with the release-validation gate enabled. -Manual runs use a -unique concurrency group so a release-candidate full suite is not cancelled by -another push or PR run on the same ref. The optional `target_ref` input lets a -trusted caller run that graph against a branch, tag, or full commit SHA while -using the workflow file from the selected dispatch ref. +- core production changes run core prod and core test typecheck plus core lint/guards; +- core test-only changes run only core test typecheck plus core lint; +- extension production changes run extension prod and extension test typecheck plus extension lint; +- extension test-only changes run extension test typecheck plus extension lint; +- public Plugin SDK or plugin-contract changes expand to extension typecheck because extensions depend on those core contracts (Vitest extension sweeps stay explicit test work); +- release metadata-only version bumps run targeted version/config/root-dependency checks; +- unknown root/config changes fail safe to all check lanes. -```bash -gh workflow run ci.yml --ref release/YYYY.M.D -gh workflow run ci.yml --ref main -f target_ref= -f include_android=true -gh workflow run full-release-validation.yml --ref main -f ref= -``` +Local changed-test routing lives in `scripts/test-projects.test-support.mjs` and is intentionally cheaper than `check:changed`: direct test edits run themselves, source edits prefer explicit mappings, then sibling tests and import-graph dependents. Shared group-room delivery config is one of the explicit mappings: changes to the group visible-reply config, source reply delivery mode, or the message-tool system prompt route through the core reply tests plus Discord and Slack delivery regressions so a shared default change fails before the first PR push. Use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when the change is harness-wide enough that the cheap mapped set is not a trustworthy proxy. -## Fail-fast order +## Testbox validation -Jobs are ordered so cheap checks fail before expensive ones run: +Run Testbox from the repo root and prefer a fresh warmed box for broad proof. Before spending a slow gate on a box that was reused, expired, or just reported an unexpectedly large sync, run `pnpm testbox:sanity` inside the box first. -1. `preflight` decides which lanes exist at all. The `docs-scope` and `changed-scope` logic are steps inside this job, not standalone jobs. -2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check`, `check-additional`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs. -3. `build-artifacts` overlaps with the fast Linux lanes so downstream consumers can start as soon as the shared build is ready. -4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-channels`, `checks-node-core-test`, `checks`, `checks-windows`, `macos-node`, `macos-swift`, and `android`. +The sanity check fails fast when required root files such as `pnpm-lock.yaml` disappeared or when `git status --short` shows at least 200 tracked deletions. That usually means the remote sync state is not a trustworthy copy of the PR; stop that box and warm a fresh one instead of debugging the product test failure. For intentional large-deletion PRs, set `OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1` for that sanity run. -Scope logic lives in `scripts/ci-changed-scope.mjs` and is covered by unit tests in `src/scripts/ci-changed-scope.test.ts`. -Manual dispatch skips changed-scope detection and makes the preflight manifest -act as if every scoped area changed. -CI workflow edits validate the Node CI graph plus workflow linting, but do not force Windows, Android, or macOS native builds by themselves; those platform lanes stay scoped to platform source changes. -CI routing-only edits, selected cheap core-test fixture edits, and narrow plugin contract helper/test-routing edits use a fast Node-only manifest path: preflight, security, and a single `checks-fast-core` task. That path avoids build artifacts, Node 22 compatibility, channel contracts, full core shards, bundled-plugin shards, and additional guard matrices when the changed files are limited to the routing or helper surfaces that the fast task exercises directly. -Windows Node checks are scoped to Windows-specific process/path wrappers, npm/pnpm/UI runner helpers, package manager config, and the CI workflow surfaces that execute that lane; unrelated source, plugin, install-smoke, and test-only changes stay on the Linux Node lanes so they do not reserve a 16-vCPU Windows worker for coverage that is already exercised by the normal test shards. -The separate `install-smoke` workflow reuses the same scope script through its own `preflight` job. It splits smoke coverage into `run_fast_install_smoke` and `run_full_install_smoke`. Pull requests run the fast path for Docker/package surfaces, bundled plugin package/manifest changes, and core plugin/channel/gateway/Plugin SDK surfaces that the Docker smoke jobs exercise. Source-only bundled plugin changes, test-only edits, and docs-only edits do not reserve Docker workers. The fast path builds the root Dockerfile image once, checks the CLI, runs the agents delete shared-workspace CLI smoke, runs the container gateway-network e2e, verifies a bundled extension build arg, and runs the bounded bundled-plugin Docker profile under a 240-second aggregate command timeout with each scenario's Docker run capped separately. The full path keeps QR package install and installer Docker/update coverage for nightly scheduled runs, manual dispatches, workflow-call release checks, and pull requests that truly touch installer/package/Docker surfaces. In full mode, install-smoke prepares or reuses one target-SHA GHCR root Dockerfile smoke image, then runs QR package install, root Dockerfile/gateway smokes, installer/update smokes, and the fast bundled-plugin Docker E2E as separate jobs so installer work does not wait behind the root image smokes. `main` pushes, including merge commits, do not force the full path; when changed-scope logic would request full coverage on a push, the workflow keeps the fast Docker smoke and leaves the full install smoke to nightly or release validation. The slow Bun global install image-provider smoke is separately gated by `run_bun_global_install_smoke`; it runs on the nightly schedule and from the release checks workflow, and manual `install-smoke` dispatches can opt into it, but pull requests and `main` pushes do not run it. QR and installer Docker tests keep their own install-focused Dockerfiles. Local `test:docker:all` prebuilds one shared live-test image, packs OpenClaw once as an npm tarball, and builds two shared `scripts/e2e/Dockerfile` images: a bare Node/Git runner for installer/update/plugin-dependency lanes and a functional image that installs the same tarball into `/app` for normal functionality lanes. Docker lane definitions live in `scripts/lib/docker-e2e-scenarios.mjs`, planner logic lives in `scripts/lib/docker-e2e-plan.mjs`, and the runner only executes the selected plan. The scheduler selects the image per lane with `OPENCLAW_DOCKER_E2E_BARE_IMAGE` and `OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE`, then runs lanes with `OPENCLAW_SKIP_DOCKER_BUILD=1`; tune the default main-pool slot count of 10 with `OPENCLAW_DOCKER_ALL_PARALLELISM` and the provider-sensitive tail-pool slot count of 10 with `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM`. Heavy lane caps default to `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=10`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7` so npm install and multi-service lanes do not overcommit Docker while lighter lanes still fill available slots. A single lane heavier than the effective caps can still start from an empty pool, then runs alone until it releases capacity. Lane starts are staggered by 2 seconds by default to avoid local Docker daemon create storms; override with `OPENCLAW_DOCKER_ALL_START_STAGGER_MS=0` or another millisecond value. The local aggregate preflights Docker, removes stale OpenClaw E2E containers, emits active-lane status, persists lane timings for longest-first ordering, and supports `OPENCLAW_DOCKER_ALL_DRY_RUN=1` for scheduler inspection. It stops scheduling new pooled lanes after the first failure by default, and each lane has a 120-minute fallback timeout overrideable with `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS`; selected live/tail lanes use tighter per-lane caps. `OPENCLAW_DOCKER_ALL_LANES=` runs exact scheduler lanes, including release-only lanes such as `install-e2e` and split bundled update lanes such as `bundled-channel-update-acpx`, while skipping the cleanup smoke so agents can reproduce one failed lane. The reusable live/E2E workflow asks `scripts/test-docker-all.mjs --plan-json` which package, image kind, live image, lane, and credential coverage is required, then `scripts/docker-e2e.mjs` converts that plan into GitHub outputs and summaries. It either packs OpenClaw through `scripts/package-openclaw-for-docker.mjs`, downloads a current-run package artifact, or downloads a package artifact from `package_artifact_run_id`; validates the tarball inventory; builds and pushes package-digest-tagged bare/functional GHCR Docker E2E images through Blacksmith's Docker layer cache when the plan needs package-installed lanes; and reuses provided `docker_e2e_bare_image`/`docker_e2e_functional_image` inputs or existing package-digest images instead of rebuilding. Docker image pulls are retried with a bounded 180-second per-attempt timeout so a stuck registry/cache stream retries quickly instead of consuming most of the CI critical path. The `Package Acceptance` workflow is the high-level package gate: it resolves a candidate from npm, a trusted `package_ref`, an HTTPS tarball plus SHA-256, or a prior workflow artifact, then passes that single `package-under-test` artifact into the reusable Docker E2E workflow. It keeps `workflow_ref` separate from `package_ref` so current acceptance logic can validate older trusted commits without checking out old workflow code. Release checks run a custom Package Acceptance delta for the target ref: bundled-channel compat, offline plugin fixtures, and Telegram package QA against the resolved tarball. The release-path Docker suite runs smaller chunked jobs with `OPENCLAW_SKIP_DOCKER_BUILD=1` so each chunk pulls only the image kind it needs and executes multiple lanes through the same weighted scheduler (`OPENCLAW_DOCKER_ALL_PROFILE=release-path`, `OPENCLAW_DOCKER_ALL_CHUNK=core|package-update-openai|package-update-anthropic|package-update-core|plugins-runtime-plugins|plugins-runtime-services|plugins-runtime-install-a|plugins-runtime-install-b|plugins-runtime-install-c|plugins-runtime-install-d|plugins-runtime-install-e|plugins-runtime-install-f|plugins-runtime-install-g|plugins-runtime-install-h|bundled-channels`). OpenWebUI is folded into `plugins-runtime-services` when full release-path coverage requests it, and keeps a standalone `openwebui` chunk only for OpenWebUI-only dispatches. The legacy aggregate chunk names `package-update`, `plugins-runtime-core`, `plugins-runtime`, and `plugins-integrations` still work for manual reruns, but the release workflow uses the split chunks so installer E2E and bundled plugin install/uninstall sweeps do not dominate the critical path. The `install-e2e` lane alias remains the aggregate manual rerun alias for both provider installer lanes. The `bundled-channels` chunk runs split `bundled-channel-*` and `bundled-channel-update-*` lanes rather than the serial all-in-one `bundled-channel-deps` lane. Each chunk uploads `.artifacts/docker-tests/` with lane logs, timings, `summary.json`, `failures.json`, phase timings, scheduler plan JSON, slow-lane tables, and per-lane rerun commands. The workflow `docker_lanes` input runs selected lanes against the prepared images instead of the chunk jobs, which keeps failed-lane debugging bounded to one targeted Docker job and prepares, downloads, or reuses the package artifact for that run; if a selected lane is a live Docker lane, the targeted job builds the live-test image locally for that rerun. Generated per-lane GitHub rerun commands include `package_artifact_run_id`, `package_artifact_name`, and prepared image inputs when those values exist, so a failed lane can reuse the exact package and images from the failed run. Use `pnpm test:docker:rerun ` to download Docker artifacts from a GitHub run and print combined/per-lane targeted rerun commands; use `pnpm test:docker:timings ` for slow-lane and phase critical-path summaries. The scheduled live/E2E workflow runs the full release-path Docker suite daily. The bundled update matrix is split by update target so repeated npm update and doctor repair passes can shard with other bundled checks. - -Current release Docker chunks are `core`, `package-update-openai`, `package-update-anthropic`, `package-update-core`, `plugins-runtime-plugins`, `plugins-runtime-services`, `plugins-runtime-install-a`, `plugins-runtime-install-b`, `plugins-runtime-install-c`, `plugins-runtime-install-d`, `plugins-runtime-install-e`, `plugins-runtime-install-f`, `plugins-runtime-install-g`, `plugins-runtime-install-h`, `bundled-channels-core`, `bundled-channels-update-a`, `bundled-channels-update-discord`, `bundled-channels-update-b`, and `bundled-channels-contracts`. The aggregate `bundled-channels` chunk remains available for manual one-shot reruns, and `plugins-runtime-core`, `plugins-runtime`, and `plugins-integrations` remain aggregate plugin/runtime aliases, but the release workflow uses the split chunks so channel smokes, update targets, plugin runtime checks, and bundled plugin install/uninstall sweeps can run in parallel. Targeted `docker_lanes` dispatches also split multiple selected lanes into parallel jobs after one shared package/image preparation step, and bundled-channel update lanes retry once for transient npm network failures. - -Local changed-lane logic lives in `scripts/changed-lanes.mjs` and is executed by `scripts/check-changed.mjs`. That local check gate is stricter about architecture boundaries than the broad CI platform scope: core production changes run core prod and core test typecheck plus core lint/guards, core test-only changes run only core test typecheck plus core lint, extension production changes run extension prod and extension test typecheck plus extension lint, and extension test-only changes run extension test typecheck plus extension lint. Public Plugin SDK or plugin-contract changes expand to extension typecheck because extensions depend on those core contracts, but Vitest extension sweeps are explicit test work. Release metadata-only version bumps run targeted version/config/root-dependency checks. Unknown root/config changes fail safe to all check lanes. -Local changed-test routing lives in `scripts/test-projects.test-support.mjs` and -is intentionally cheaper than `check:changed`: direct test edits run themselves, -source edits prefer explicit mappings, then sibling tests and import-graph -dependents. Shared group-room delivery config is one of the explicit mappings: -changes to the group visible-reply config, source reply delivery mode, or the -message-tool system prompt route through the core reply tests plus Discord and -Slack delivery regressions so a shared default change fails before the first PR -push. Use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when the change -is harness-wide enough that the cheap mapped set is not a trustworthy proxy. - -For Testbox validation, run from the repo root and prefer a fresh warmed box for -broad proof. Before spending a slow gate on a box that was reused, expired, or -just reported an unexpectedly large sync, run `pnpm testbox:sanity` inside the -box first. The sanity check fails fast when required root files such as -`pnpm-lock.yaml` disappeared or when `git status --short` shows at least 200 -tracked deletions. That usually means the remote sync state is not a trustworthy -copy of the PR. Stop that box and warm a fresh one instead of debugging the -product test failure. For intentional large deletion PRs, set -`OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1` for that sanity run. `pnpm -testbox:run` also terminates a local Blacksmith CLI invocation that stays in the -sync phase for more than five minutes without post-sync output. Set -`OPENCLAW_TESTBOX_SYNC_TIMEOUT_MS=0` to disable that guard, or use a larger -millisecond value for unusually large local diffs. - -Manual CI dispatches run `checks-node-compat-node22` as broad compatibility coverage. Android is opt-in for standalone manual CI through `include_android=true` and always enabled for `Full Release Validation`. `Plugin Prerelease` is more expensive product/package coverage, so it is a separate workflow dispatched by `Full Release Validation` or by an explicit operator. Normal pull requests, `main` pushes, and standalone manual CI dispatches keep that suite off. - -The slowest Node test families are split or balanced so each job stays small without over-reserving runners: channel contracts run as three weighted shards, small core unit lanes are paired, auto-reply runs as four balanced workers with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards, and agentic gateway/plugin configs are spread across the existing source-only agentic Node jobs instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. `Plugin Prerelease` balances bundled plugin tests across eight extension workers; those extension shard jobs run up to two plugin config groups at a time with one Vitest worker per group and a larger Node heap so import-heavy plugin batches do not create extra CI jobs. The broad agents lane uses the shared Vitest file-parallel scheduler because it is import/scheduling dominated rather than owned by a single slow test file. `runtime-config` runs with the infra core-runtime shard to keep the shared runtime shard from owning the tail. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard shard runs its small independent guards concurrently inside one job. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built, keeping their old check names as lightweight verifier jobs while avoiding two extra Blacksmith workers and a second artifact-consumer queue. -Android CI runs both `testPlayDebugUnitTest` and `testThirdPartyDebugUnitTest`, then builds the Play debug APK. The third-party flavor has no separate source set or manifest; its unit-test lane still compiles that flavor with the SMS/call-log BuildConfig flags, while avoiding a duplicate debug APK packaging job on every Android-relevant push. -GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Aggregate shard checks use `!cancelled() && always()` so they still report normal shard failures but do not queue after the whole workflow has already been superseded. -The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs. - -## Runners - -| Runner | Jobs | -| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `ubuntu-24.04` | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, sharded channel contract checks, `check` shards except lint, `check-additional` shards and aggregates, Node test aggregate verifiers, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier | -| `blacksmith-4vcpu-ubuntu-2404` | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types` | -| `blacksmith-8vcpu-ubuntu-2404` | `build-artifacts`, build-smoke, Linux Node test shards, bundled plugin test shards, `android` | -| `blacksmith-16vcpu-ubuntu-2404` | `check-lint`, which remains CPU-sensitive enough that 8 vCPU cost more than it saved; install-smoke Docker builds, where 32-vCPU queue time cost more than it saved | -| `blacksmith-16vcpu-windows-2025` | `checks-windows` | -| `blacksmith-6vcpu-macos-latest` | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest` | -| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest` | - -## Local equivalents - -```bash -pnpm changed:lanes # inspect the local changed-lane classifier for origin/main...HEAD -pnpm check:changed # smart local check gate: changed typecheck/lint/guards by boundary lane -pnpm check # fast local gate: production tsgo + sharded lint + parallel fast guards -pnpm check:test-types -pnpm check:timed # same gate with per-stage timings -pnpm build:strict-smoke -pnpm check:architecture -pnpm test:gateway:watch-regression -pnpm test # vitest tests -pnpm test:changed # cheap smart changed Vitest targets -pnpm test:channels -pnpm test:contracts:channels -pnpm check:docs # docs format + lint + broken links -pnpm build # build dist when CI artifact/build-smoke lanes matter -pnpm ci:timings # summarize the latest origin/main push CI run -pnpm ci:timings:recent # compare recent successful main CI runs -node scripts/ci-run-timings.mjs # summarize wall time, queue time, and slowest jobs -node scripts/ci-run-timings.mjs --latest-main # ignore issue/comment noise and choose origin/main push CI -node scripts/ci-run-timings.mjs --recent 10 # compare recent successful main CI runs -pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/baseline-before.json -pnpm test:perf:groups:compare .artifacts/test-perf/baseline-before.json .artifacts/test-perf/after-agent.json -``` +`pnpm testbox:run` also terminates a local Blacksmith CLI invocation that stays in the sync phase for more than five minutes without post-sync output. Set `OPENCLAW_TESTBOX_SYNC_TIMEOUT_MS=0` to disable that guard, or use a larger millisecond value for unusually large local diffs. ## Related - [Install overview](/install) -- [Release channels](/install/development-channels) +- [Development channels](/install/development-channels)