mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-19 23:04:46 +00:00
* refactor: remove stale file-backed shims * fix: harden sqlite state ci boundaries * refactor: store matrix idb snapshots in sqlite * fix: satisfy rebased CI guardrails * refactor: store current conversation bindings in sqlite table * refactor: store tui last sessions in sqlite table * refactor: reset sqlite schema history * refactor: drop unshipped sqlite table migration * refactor: remove plugin index file rollback * refactor: drop unshipped sqlite sidecar migrations * refactor: remove runtime commitments kv migration * refactor: preserve kysely sync result types * refactor: drop unshipped sqlite schema migration table * test: keep session usage coverage sqlite-backed * refactor: keep sqlite migration doctor-only * refactor: isolate device legacy imports * refactor: isolate push voicewake legacy imports * refactor: isolate remaining runtime legacy imports * refactor: tighten sqlite migration guardrails * test: cover sqlite persisted enum parsing * refactor: isolate legacy update and tui imports * refactor: tighten sqlite state ownership * refactor: move legacy imports behind doctor * refactor: remove legacy session row lookup * refactor: canonicalize memory transcript locators * refactor: drop transcript path scope fallbacks * refactor: drop runtime legacy session delivery pruning * refactor: store tts prefs only in sqlite * refactor: remove cron store path runtime * refactor: use cron sqlite store keys * refactor: rename telegram message cache scope * refactor: read memory dreaming status from sqlite * refactor: rename cron status store key * refactor: stop remembering transcript file paths * test: use sqlite locators in agent fixtures * refactor: remove file-shaped commitments and cron store surfaces * refactor: keep compaction transcript handles out of session rows * refactor: derive transcript handles from session identity * refactor: derive runtime transcript handles * refactor: remove gateway session locator reads * refactor: remove transcript locator from session rows * refactor: store raw stream diagnostics in sqlite * refactor: remove file-shaped transcript rotation * refactor: hide legacy trajectory paths from runtime * refactor: remove runtime transcript file bridges * refactor: repair database-first rebase fallout * refactor: align tests with database-first state * refactor: remove transcript file handoffs * refactor: sync post-compaction memory by transcript scope * refactor: run codex app-server sessions by id * refactor: bind codex runtime state by session id * refactor: pass memory transcripts by sqlite scope * refactor: remove transcript locator cleanup leftovers * test: remove stale transcript file fixtures * refactor: remove transcript locator test helper * test: make cron sqlite keys explicit * test: remove cron runtime store paths * test: remove stale session file fixtures * test: use sqlite cron keys in diagnostics * refactor: remove runtime delivery queue backfill * test: drop fake export session file mocks * refactor: rename acp session read failure flag * refactor: rename acp row session key * refactor: remove session store test seams * refactor: move legacy session parser tests to doctor * refactor: reindex managed memory in place * refactor: drop stale session store wording * refactor: rename session row helpers * refactor: rename sqlite session entry modules * refactor: remove transcript locator leftovers * refactor: trim file-era audit wording * refactor: clean managed media through sqlite * fix: prefer explicit agent for exports * fix: use prepared agent for session resets * fix: canonicalize legacy codex binding import * test: rename state cleanup helper * docs: align backup docs with sqlite state * refactor: drop legacy Pi usage auth fallback * refactor: move legacy auth profile imports to doctor * refactor: keep Pi model discovery auth in memory * refactor: remove MSTeams legacy learning key fallback * refactor: store model catalog config in sqlite * refactor: use sqlite model catalog at runtime * refactor: remove model json compatibility aliases * refactor: store auth profiles in sqlite * refactor: seed copied auth profiles in sqlite * refactor: make auth profile runtime sqlite-addressed * refactor: migrate hermes secrets into sqlite auth store * refactor: move plugin install config migration to doctor * refactor: rename plugin index audit checks * test: drop auth file assumptions * test: remove legacy transcript file assertions * refactor: drop legacy cli session aliases * refactor: store skill uploads in sqlite * refactor: keep subagent attachments in sqlite vfs * refactor: drop subagent attachment cleanup state * refactor: move legacy session aliases to doctor * refactor: require node 24 for sqlite state runtime * refactor: move provider caches into sqlite state * fix: harden virtual agent filesystem * refactor: enforce database-first runtime state * refactor: rename compaction transcript rotation setting * test: clean sqlite refactor test types * refactor: consolidate sqlite runtime state * refactor: model session conversations in sqlite * refactor: stop deriving cron delivery from session keys * refactor: stop classifying sessions from key shape * refactor: hydrate announce targets from typed delivery * refactor: route heartbeat delivery from typed sqlite context * refactor: tighten typed sqlite session routing * refactor: remove session origin routing shadow * refactor: drop session origin shadow fixtures * perf: query sqlite vfs paths by prefix * refactor: use typed conversation metadata for sessions * refactor: prefer typed session routing metadata * refactor: require typed session routing metadata * refactor: resolve group tool policy from typed sessions * refactor: delete dead session thread info bridge * Show Codex subscription reset times in channel errors (#80456) * feat(plugin-sdk): consolidate session workflow APIs * fix(agents): allow read-only agent mount reads * [codex] refresh plugin regression fixtures * fix(agents): restore compaction gateway logs * test: tighten gateway startup assertions * Redact persisted secret-shaped payloads [AI] (#79006) * test: tighten device pair notify assertions * test: tighten hermes secret assertions * test: assert matrix client error shapes * test: assert config compat warnings * fix(heartbeat): remap cron-run exec events to session keys (#80214) * fix(codex): route btw through native side threads * fix(auth): accept friendly OpenAI order for Codex profiles * fix(codex): rotate auth profiles inside harness * fix: keep browser status page probe within timeout * test: assert agents add outputs * test: pin cron read status * fix(agents): avoid Pi resource discovery stalls Co-authored-by: dataCenter430 <titan032000@gmail.com> * fix: retire timed-out codex app-server clients * test: tighten qa lab runtime assertions * test: check security fix outputs * test: verify extension runtime messages * feat(wake): expose typed sessionKey on wake protocol + system event CLI * fix(gateway): await session_end during shutdown drain and track channel + compaction lifecycle paths (#57790) * test: guard talk consult call helper * fix(codex): scale context engine projection (#80761) * fix(codex): scale context engine projection * fix: document Codex context projection scaling * fix: document Codex context projection scaling * fix: document Codex context projection scaling * fix: document Codex context projection scaling * chore: align Codex projection changelog * chore: realign Codex projection changelog * fix: isolate Codex projection patch --------- Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org> Co-authored-by: Josh Lehman <josh@martian.engineering> * refactor: move agent runtime state toward piless * refactor: remove cron session reaper * refactor: move session management to sqlite * refactor: finish database-first state migration * chore: refresh generated sqlite db types * refactor: remove stale file-backed shims * test: harden kysely type coverage # Conflicts: # .agents/skills/kysely-database-access/SKILL.md # src/infra/kysely-sync.types.test.ts # src/proxy-capture/store.sqlite.test.ts # src/state/openclaw-agent-db.test.ts # src/state/openclaw-state-db.test.ts * refactor: remove cron store path runtime * refactor: keep compaction transcript handles out of session rows * refactor: derive embedded transcripts from sqlite identity * refactor: remove embedded transcript locator handoff * refactor: remove runtime transcript file bridges * refactor: remove transcript file handoffs * refactor: remove MSTeams legacy learning key fallback * refactor: store model catalog config in sqlite * refactor: use sqlite model catalog at runtime # Conflicts: # docs/cli/secrets.md # docs/gateway/authentication.md # docs/gateway/secrets.md * fix: keep oauth sibling sync sqlite-local # Conflicts: # src/commands/onboard-auth.test.ts * refactor: remove task session store maintenance # Conflicts: # src/commands/tasks.ts * refactor: keep diagnostics in state sqlite * refactor: enforce database-first runtime state * refactor: consolidate sqlite runtime state * Show Codex subscription reset times in channel errors (#80456) * fix(codex): refresh subscription limit resets * fix(codex): format reset times for channels * Update CHANGELOG with latest changes and fixes Updated CHANGELOG with recent fixes and improvements. * fix(codex): keep command load failures on codex surface * fix(codex): format account rate limits as rows * fix(codex): summarize account limits as usage status * fix(codex): simplify account limit status * test: tighten subagent announce queue assertion * test: tighten session delete lifecycle assertions * test: tighten cron ops assertions * fix: track cron execution milestones * test: tighten hermes secret assertions * test: assert matrix sync store payloads * test: assert config compat warnings * fix(codex): align btw side thread semantics * fix(codex): honor codex fallback blocking * fix(agents): avoid Pi resource discovery stalls * test: tighten codex event assertions * test: tighten cron assertions * Fix Codex app-server OAuth harness auth * refactor: move agent runtime state toward piless * refactor: move device and push state to sqlite * refactor: move runtime json state imports to doctor * refactor: finish database-first state migration * chore: refresh generated sqlite db types * refactor: clarify cron sqlite store keys * refactor: remove stale file-backed shims * refactor: bind codex runtime state by session id * test: expect sqlite trajectory branch export * refactor: rename session row helpers * fix: keep legacy device identity import in doctor * refactor: enforce database-first runtime state * refactor: consolidate sqlite runtime state * build: align pi contract wrappers * chore: repair database-first rebase * refactor: remove session file test contracts * test: update gateway session expectations * refactor: stop routing from session compatibility shadows * refactor: stop persisting session route shadows * refactor: use typed delivery context in clients * refactor: stop echoing session route shadows * refactor: repair embedded runner rebase imports # Conflicts: # src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts * refactor: align pi contract imports * refactor: satisfy kysely sync helper guard * refactor: remove file transcript bridge remnants * refactor: remove session locator compatibility * refactor: remove session file test contracts * refactor: keep rebase database-first clean * refactor: remove session file assumptions from e2e * docs: clarify database-first goal state * test: remove legacy store markers from sqlite runtime tests * refactor: remove legacy store assumptions from runtime seams * refactor: align sqlite runtime helper seams * test: update memory recall sqlite audit mock * refactor: align database-first runtime type seams * test: clarify doctor cron legacy store names * fix: preserve sqlite session route projections * test: fix copilot token cache test syntax * docs: update database-first proof status * test: align database-first test fixtures * docs: update database-first proof status * refactor: clean extension database-first drift * test: align agent session route proof * test: clarify doctor legacy path fixtures * chore: clean database-first changed checks * chore: repair database-first rebase markers * build: allow baileys git subdependency * chore: repair exp-vfs rebase drift * chore: finish exp-vfs rebase cleanup * chore: satisfy rebase lint drift * chore: fix qqbot rebase type seam * chore: fix rebase drift leftovers * fix: keep auth profile oauth secrets out of sqlite * fix: repair rebase drift tests * test: stabilize pairing request ordering * test: use source manifests in plugin contract checks * fix: restore gateway session metadata after rebase * fix: repair database-first rebase drift * fix: clean up database-first rebase fallout * test: stabilize line quick reply receipt time * fix: repair extension rebase drift * test: keep transcript redaction tests sqlite-backed * fix: carry injected transcript redaction through sqlite * chore: clean database branch rebase residue * fix: repair database branch CI drift * fix: repair database branch CI guard drift * fix: stabilize oauth tls preflight test * test: align database branch fast guards * test: repair build artifact boundary guards * chore: clean changelog rebase markers --------- Co-authored-by: pashpashpash <nik@vault77.ai> Co-authored-by: Eva <eva@100yen.org> Co-authored-by: stainlu <stainlu@newtype-ai.org> Co-authored-by: Jason Zhou <jason.zhou.design@gmail.com> Co-authored-by: Ruben Cuevas <hi@rubencu.com> Co-authored-by: Pavan Kumar Gondhi <pavangondhi@gmail.com> Co-authored-by: Shakker <shakkerdroid@gmail.com> Co-authored-by: Kaspre <36520309+Kaspre@users.noreply.github.com> Co-authored-by: dataCenter430 <titan032000@gmail.com> Co-authored-by: Kaspre <kaspre@gmail.com> Co-authored-by: pandadev66 <nova.full.stack@outlook.com> Co-authored-by: Eva <admin@100yen.org> Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org> Co-authored-by: Josh Lehman <josh@martian.engineering> Co-authored-by: jeffjhunter <support@aipersonamethod.com>
151 lines
19 KiB
Markdown
151 lines
19 KiB
Markdown
---
|
|
summary: "How to run tests locally (vitest) and when to use force/coverage modes"
|
|
read_when:
|
|
- Running or fixing tests
|
|
title: "Tests"
|
|
---
|
|
|
|
- Full testing kit (suites, live, Docker): [Testing](/help/testing)
|
|
- Update and plugin package validation: [Testing updates and plugins](/help/testing-updates-plugins)
|
|
|
|
- `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests don't collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
|
|
- `pnpm test:coverage`: Runs the unit suite with V8 coverage (via `vitest.unit.config.ts`). This is a default-unit-lane coverage gate, not whole-repo all-file coverage. Thresholds are 70% lines/functions/statements and 55% branches. Because `coverage.all` is false and the default lane scopes coverage includes to non-fast unit tests with sibling source files, the gate measures source owned by this lane instead of every transitive import it happens to load.
|
|
- `pnpm test:coverage:changed`: Runs unit coverage only for files changed since `origin/main`.
|
|
- `pnpm test:changed`: cheap smart changed test run. It runs precise targets from direct test edits, sibling `*.test.ts` files, explicit source mappings, and the local import graph. Broad/config/package changes are skipped unless they map to precise tests.
|
|
- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed`: explicit broad changed test run. Use it when a test harness/config/package edit should fall back to Vitest's broader changed-test behavior.
|
|
- `pnpm changed:lanes`: shows the architectural lanes triggered by the diff against `origin/main`.
|
|
- `pnpm check:changed`: runs the smart changed check gate for the diff against `origin/main`. It runs typecheck, lint, and guard commands for the affected architectural lanes, but does not run Vitest tests. Use `pnpm test:changed` or explicit `pnpm test <target>` for test proof.
|
|
- `OPENCLAW_HEAVY_CHECK_LOCK_SCOPE=worktree <local-heavy-check command>`: keeps heavy-check serialization inside the current worktree instead of the Git common dir for commands such as `pnpm check:changed` and targeted `pnpm test ...`. Use it only on high-capacity local hosts when you intentionally run independent checks across linked worktrees.
|
|
- `pnpm test`: routes explicit file/directory targets through scoped Vitest lanes. Untargeted runs use fixed shard groups and expand to leaf configs for local parallel execution; the extension group always expands to the per-extension shard configs instead of one giant root-project process.
|
|
- Test wrapper runs end with a short `[test] passed|failed|skipped ... in ...` summary. Vitest's own duration line stays the per-shard detail.
|
|
- Shared OpenClaw test state: use `src/test-utils/openclaw-test-state.ts` from Vitest when a test needs an isolated `HOME`, `OPENCLAW_STATE_DIR`, `OPENCLAW_CONFIG_PATH`, config fixture, workspace, agent dir, or auth-profile store.
|
|
- Process E2E helpers: use `test/helpers/openclaw-test-instance.ts` when a Vitest process-level E2E test needs a running Gateway, CLI env, log capture, and cleanup in one place.
|
|
- Docker/Bash E2E helpers: lanes that source `scripts/lib/docker-e2e-image.sh` can pass `docker_e2e_test_state_shell_b64 <label> <scenario>` into the container and decode it with `scripts/lib/openclaw-e2e-instance.sh`; multi-home scripts can pass `docker_e2e_test_state_function_b64` and call `openclaw_test_state_create <label> <scenario>` in each flow. Lower-level callers can use `scripts/lib/openclaw-test-state.mjs shell --label <name> --scenario <name>` for an in-container shell snippet, or `node scripts/lib/openclaw-test-state.mjs -- create --label <name> --scenario <name> --env-file <path> --json` for a sourceable host env file. The `--` before `create` keeps newer Node runtimes from treating `--env-file` as a Node flag. Docker/Bash lanes that launch a Gateway can source `scripts/lib/openclaw-e2e-instance.sh` inside the container for entrypoint resolution, mock OpenAI startup, Gateway foreground/background launch, readiness probes, state env export, log dumps, and process cleanup.
|
|
- Full, extension, and include-pattern shard runs update local timing data in `.artifacts/vitest-shard-timings.json`; later whole-config runs use those timings to balance slow and fast shards. Include-pattern CI shards append the shard name to the timing key, which keeps filtered shard timings visible without replacing whole-config timing data. Set `OPENCLAW_TEST_PROJECTS_TIMINGS=0` to ignore the local timing artifact.
|
|
- Selected `plugin-sdk` and `commands` test files now route through dedicated light lanes that keep only `test/setup.ts`, leaving runtime-heavy cases on their existing lanes.
|
|
- Source files with sibling tests map to that sibling before falling back to wider directory globs. Helper edits under `src/channels/plugins/contracts/test-helpers`, `src/plugin-sdk/test-helpers`, and `src/plugins/contracts` use a local import graph to run importing tests instead of broad-running every shard when the dependency path is precise.
|
|
- `auto-reply` now also splits into three dedicated configs (`core`, `top-level`, `reply`) so the reply harness does not dominate the lighter top-level status/token/helper tests.
|
|
- Base Vitest config now defaults to `pool: "threads"` and `isolate: false`, with the shared non-isolated runner enabled across the repo configs.
|
|
- `pnpm test:channels` runs `vitest.channels.config.ts`.
|
|
- `pnpm test:extensions` and `pnpm test extensions` run all extension/plugin shards. Heavy channel plugins, the browser plugin, and OpenAI run as dedicated shards; other plugin groups stay batched. Use `pnpm test extensions/<id>` for one bundled plugin lane.
|
|
- `pnpm test:perf:imports`: enables Vitest import-duration + import-breakdown reporting, while still using scoped lane routing for explicit file/directory targets.
|
|
- `pnpm test:perf:imports:changed`: same import profiling, but only for files changed since `origin/main`.
|
|
- `pnpm test:perf:changed:bench -- --ref <git-ref>` benchmarks the routed changed-mode path against the native root-project run for the same committed git diff.
|
|
- `pnpm test:perf:changed:bench -- --worktree` benchmarks the current worktree change set without committing first.
|
|
- `pnpm test:perf:profile:main`: writes a CPU profile for the Vitest main thread (`.artifacts/vitest-main-profile`).
|
|
- `pnpm test:perf:profile:runner`: writes CPU + heap profiles for the unit runner (`.artifacts/vitest-runner-profile`).
|
|
- `pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/baseline-before.json`: runs every full-suite Vitest leaf config serially and writes grouped duration data plus per-config JSON/log artifacts. The Test Performance Agent uses this as its baseline before attempting slow-test fixes.
|
|
- `pnpm test:perf:groups:compare .artifacts/test-perf/baseline-before.json .artifacts/test-perf/after-agent.json`: compares grouped reports after a performance-focused change.
|
|
- Gateway integration: opt-in via `OPENCLAW_TEST_INCLUDE_GATEWAY=1 pnpm test` or `pnpm test:gateway`.
|
|
- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs.
|
|
- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip.
|
|
- `pnpm test:docker:all`: Builds the shared live-test image, packs OpenClaw once as an npm tarball, builds/reuses a bare Node/Git runner image plus a functional image that installs that tarball into `/app`, then runs Docker smoke lanes with `OPENCLAW_SKIP_DOCKER_BUILD=1` through a weighted scheduler. The bare image (`OPENCLAW_DOCKER_E2E_BARE_IMAGE`) is used for installer/update/plugin-dependency lanes; those lanes mount the prebuilt tarball instead of using copied repo sources. The functional image (`OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE`) is used for normal built-app functionality lanes. `scripts/package-openclaw-for-docker.mjs` is the single local/CI package packer and validates the tarball plus `dist/postinstall-inventory.json` before Docker consumes it. Docker lane definitions live in `scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in `scripts/lib/docker-e2e-plan.mjs`; `scripts/test-docker-all.mjs` executes the selected plan. `node scripts/test-docker-all.mjs --plan-json` emits the scheduler-owned CI plan for selected lanes, image kinds, package/live-image needs, state scenarios, and credential checks without building or running Docker. `OPENCLAW_DOCKER_ALL_PARALLELISM=<n>` controls process slots and defaults to 10; `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM=<n>` controls the provider-sensitive tail pool and defaults to 10. Heavy lane caps default to `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=10`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7`; provider caps default to one heavy lane per provider via `OPENCLAW_DOCKER_ALL_LIVE_CLAUDE_LIMIT=4`, `OPENCLAW_DOCKER_ALL_LIVE_CODEX_LIMIT=4`, and `OPENCLAW_DOCKER_ALL_LIVE_GEMINI_LIMIT=4`. Use `OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT` or `OPENCLAW_DOCKER_ALL_DOCKER_LIMIT` for larger hosts. If one lane exceeds the effective weight or resource cap on a low-parallelism host, it can still start from an empty pool and will run alone until it releases capacity. Lane starts are staggered by 2 seconds by default to avoid local Docker daemon create storms; override with `OPENCLAW_DOCKER_ALL_START_STAGGER_MS=<ms>`. The runner preflights Docker by default, cleans stale OpenClaw E2E containers, emits active-lane status every 30 seconds, shares provider CLI tool caches between compatible lanes, retries transient live-provider failures once by default (`OPENCLAW_DOCKER_ALL_LIVE_RETRIES=<n>`), and stores lane timings in `.artifacts/docker-tests/lane-timings.json` for longest-first ordering on later runs. Use `OPENCLAW_DOCKER_ALL_DRY_RUN=1` to print the lane manifest without running Docker, `OPENCLAW_DOCKER_ALL_STATUS_INTERVAL_MS=<ms>` to tune status output, or `OPENCLAW_DOCKER_ALL_TIMINGS=0` to disable timing reuse. Use `OPENCLAW_DOCKER_ALL_LIVE_MODE=skip` for deterministic/local lanes only or `OPENCLAW_DOCKER_ALL_LIVE_MODE=only` for live-provider lanes only; package aliases are `pnpm test:docker:local:all` and `pnpm test:docker:live:all`. Live-only mode merges main and tail live lanes into one longest-first pool so provider buckets can pack Claude, Codex, and Gemini work together. The runner stops scheduling new pooled lanes after the first failure unless `OPENCLAW_DOCKER_ALL_FAIL_FAST=0` is set, and each lane has a 120-minute fallback timeout overrideable with `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS`; selected live/tail lanes use tighter per-lane caps. CLI backend Docker setup commands have their own timeout via `OPENCLAW_LIVE_CLI_BACKEND_SETUP_TIMEOUT_SECONDS` (default 180). Per-lane logs, `summary.json`, `failures.json`, and phase timings are written under `.artifacts/docker-tests/<run-id>/`; use `pnpm test:docker:timings <summary.json>` to inspect slow lanes and `pnpm test:docker:rerun <run-id|summary.json|failures.json>` to print cheap targeted rerun commands.
|
|
- `pnpm test:docker:browser-cdp-snapshot`: Builds a Chromium-backed source E2E container, starts raw CDP plus an isolated Gateway, runs `browser doctor --deep`, and verifies CDP role snapshots include link URLs, cursor-promoted clickables, iframe refs, and frame metadata.
|
|
- `pnpm test:docker:skill-install`: Installs the packed OpenClaw tarball in a bare Docker runner, disables `skills.install.allowUploadedArchives`, resolves a current skill slug from live ClawHub search, installs it through `openclaw skills install`, and verifies `SKILL.md`, `.clawhub/origin.json`, `.clawhub/lock.json`, and `skills info --json`.
|
|
- CLI backend live Docker probes can be run as focused lanes, for example `pnpm test:docker:live-cli-backend:codex`, `pnpm test:docker:live-cli-backend:codex:resume`, or `pnpm test:docker:live-cli-backend:codex:mcp`. Claude and Gemini have matching `:resume` and `:mcp` aliases.
|
|
- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key, pulls an external Open WebUI image, and is not expected to be CI-stable like the normal unit/e2e suites.
|
|
- `pnpm test:docker:mcp-channels`: Starts a seeded Gateway container and a second client container that spawns `openclaw mcp serve`, then verifies routed conversation discovery, transcript reads, attachment metadata, live event queue behavior, outbound send routing, and Claude-style channel + permission notifications over the real stdio bridge. The Claude notification assertion reads the raw stdio MCP frames directly so the smoke reflects what the bridge actually emits.
|
|
- `pnpm test:docker:upgrade-survivor`: Installs the packed OpenClaw tarball over a dirty old-user fixture, runs package update plus non-interactive doctor without live provider or channel keys, then starts a loopback Gateway and checks that agents, channel config, plugin allowlists, workspace/session state, stale legacy plugin dependency state, startup, and RPC status survive.
|
|
- `pnpm test:docker:published-upgrade-survivor`: Installs `openclaw@latest` by default, seeds realistic existing-user files without live provider or channel keys, configures that baseline with a baked `openclaw config set` command recipe, updates that published install to the packed OpenClaw tarball, runs non-interactive doctor, writes `.artifacts/upgrade-survivor/summary.json`, then starts a loopback Gateway and checks that configured intents, workspace/session state, stale plugin config and legacy dependency state, startup, `/healthz`, `/readyz`, and RPC status survive or repair cleanly. Override one baseline with `OPENCLAW_UPGRADE_SURVIVOR_BASELINE_SPEC`, expand an exact local matrix with `OPENCLAW_UPGRADE_SURVIVOR_BASELINE_SPECS` such as `openclaw@2026.5.2 openclaw@2026.4.23 openclaw@2026.4.15`, or add scenario fixtures with `OPENCLAW_UPGRADE_SURVIVOR_SCENARIOS=reported-issues`; the reported-issues set includes `configured-plugin-installs` to verify configured external OpenClaw plugins install automatically during upgrade and `stale-source-plugin-shadow` to keep source-only plugin shadows from breaking startup. Package Acceptance exposes those as `published_upgrade_survivor_baseline`, `published_upgrade_survivor_baselines`, and `published_upgrade_survivor_scenarios`, and resolves meta baseline tokens such as `last-stable-4` or `all-since-2026.4.23` before handing exact package specs to Docker lanes.
|
|
- `pnpm test:docker:update-migration`: Runs the published-upgrade survivor harness in the cleanup-heavy `plugin-deps-cleanup` scenario, starting at `openclaw@2026.4.23` by default. The separate `Update Migration` workflow expands this lane with `baselines=all-since-2026.4.23` so every stable published package from `.23` onward updates to the candidate and proves configured-plugin dependency cleanup outside Full Release CI.
|
|
- `pnpm test:docker:plugins`: Runs install/update smoke for local path, `file:`, npm registry packages with hoisted dependencies, git moving refs, ClawHub fixtures, marketplace updates, and Claude-bundle enable/inspect.
|
|
|
|
## Local PR gate
|
|
|
|
For local PR land/gate checks, run:
|
|
|
|
- `pnpm check:changed`
|
|
- `pnpm check`
|
|
- `pnpm check:test-types`
|
|
- `pnpm build`
|
|
- `pnpm test`
|
|
- `pnpm check:docs`
|
|
|
|
If `pnpm test` flakes on a loaded host, rerun once before treating it as a regression, then isolate with `pnpm test <path/to/test>`. For memory-constrained hosts, use:
|
|
|
|
- `OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test`
|
|
- `OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=/tmp/openclaw-vitest-cache pnpm test:changed`
|
|
|
|
## Model latency bench (local keys)
|
|
|
|
Script: [`scripts/bench-model.ts`](https://github.com/openclaw/openclaw/blob/main/scripts/bench-model.ts)
|
|
|
|
Usage:
|
|
|
|
- `pnpm tsx scripts/bench-model.ts --runs 10`
|
|
- Optional env: `MINIMAX_API_KEY`, `MINIMAX_BASE_URL`, `MINIMAX_MODEL`, `ANTHROPIC_API_KEY`
|
|
- Default prompt: "Reply with a single word: ok. No punctuation or extra text."
|
|
|
|
Last run (2025-12-31, 20 runs):
|
|
|
|
- minimax median 1279ms (min 1114, max 2431)
|
|
- opus median 2454ms (min 1224, max 3170)
|
|
|
|
## CLI startup bench
|
|
|
|
Script: [`scripts/bench-cli-startup.ts`](https://github.com/openclaw/openclaw/blob/main/scripts/bench-cli-startup.ts)
|
|
|
|
Usage:
|
|
|
|
- `pnpm test:startup:bench`
|
|
- `pnpm test:startup:bench:smoke`
|
|
- `pnpm test:startup:bench:save`
|
|
- `pnpm test:startup:bench:update`
|
|
- `pnpm test:startup:bench:check`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --runs 12`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset real`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset real --case status --case gatewayStatus --runs 3`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset real --case tasksJson --case tasksListJson --case tasksAuditJson --runs 3`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --entry openclaw.mjs --entry-secondary dist/entry.js --preset all`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset all --output .artifacts/cli-startup-bench-all.json`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset real --case gatewayStatusJson --output .artifacts/cli-startup-bench-smoke.json`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --preset real --cpu-prof-dir .artifacts/cli-cpu`
|
|
- `pnpm tsx scripts/bench-cli-startup.ts --json`
|
|
|
|
Presets:
|
|
|
|
- `startup`: `--version`, `--help`, `health`, `health --json`, `status --json`, `status`
|
|
- `real`: `health`, `status`, `status --json`, `sessions`, `sessions --json`, `tasks --json`, `tasks list --json`, `tasks audit --json`, `agents list --json`, `gateway status`, `gateway status --json`, `gateway health --json`, `config get gateway.port`
|
|
- `all`: both presets
|
|
|
|
Output includes `sampleCount`, avg, p50, p95, min/max, exit-code/signal distribution, and max RSS summaries for each command. Optional `--cpu-prof-dir` / `--heap-prof-dir` writes V8 profiles per run so timing and profile capture use the same harness.
|
|
|
|
Saved output conventions:
|
|
|
|
- `pnpm test:startup:bench:smoke` writes the targeted smoke artifact at `.artifacts/cli-startup-bench-smoke.json`
|
|
- `pnpm test:startup:bench:save` writes the full-suite artifact at `.artifacts/cli-startup-bench-all.json` using `runs=5` and `warmup=1`
|
|
- `pnpm test:startup:bench:update` refreshes the checked-in baseline fixture at `test/fixtures/cli-startup-bench.json` using `runs=5` and `warmup=1`
|
|
|
|
Checked-in fixture:
|
|
|
|
- `test/fixtures/cli-startup-bench.json`
|
|
- Refresh with `pnpm test:startup:bench:update`
|
|
- Compare current results against the fixture with `pnpm test:startup:bench:check`
|
|
|
|
## Onboarding E2E (Docker)
|
|
|
|
Docker is optional; this is only needed for containerized onboarding smoke tests.
|
|
|
|
Full cold-start flow in a clean Linux container:
|
|
|
|
```bash
|
|
scripts/e2e/onboard-docker.sh
|
|
```
|
|
|
|
This script drives the interactive wizard via a pseudo-tty, verifies config/workspace/session state, then starts the gateway and runs `openclaw health`.
|
|
|
|
## QR import smoke (Docker)
|
|
|
|
Ensures the maintained QR runtime helper loads under the supported Docker Node runtime:
|
|
|
|
```bash
|
|
pnpm test:docker:qr
|
|
```
|
|
|
|
## Related
|
|
|
|
- [Testing](/help/testing)
|
|
- [Testing live](/help/testing-live)
|
|
- [Testing updates and plugins](/help/testing-updates-plugins)
|