diff --git a/.agents/skills/openclaw-test-performance/SKILL.md b/.agents/skills/openclaw-test-performance/SKILL.md new file mode 100644 index 00000000000..17ffef9cbcf --- /dev/null +++ b/.agents/skills/openclaw-test-performance/SKILL.md @@ -0,0 +1,134 @@ +--- +name: openclaw-test-performance +description: Benchmark, diagnose, and optimize OpenClaw test performance without losing coverage. Use when Codex needs to reassess `pnpm test`, compare grouped Vitest reports, identify CPU/memory/import hotspots, fix slow tests or cold runtime paths, preserve behavior proofs, update the performance report, add AGENTS guardrails, and make scoped commits/pushes for OpenClaw test-speed work. +--- + +# OpenClaw Test Performance + +Use evidence first. The goal is real `pnpm test` speed/RSS improvement with +coverage intact, not runner tuning by guesswork. + +## Workflow + +1. Read the relevant local `AGENTS.md` files before editing: + - `src/agents/AGENTS.md` for agent/import hotspots. + - `src/channels/AGENTS.md` and `src/plugins/AGENTS.md` for plugin/channel + laziness. + - `src/gateway/AGENTS.md` for server lifecycle tests. + - `test/helpers/AGENTS.md` and `test/helpers/channels/AGENTS.md` for shared + contract helpers. + - `src/infra/outbound/AGENTS.md` for outbound/media/action tests. +2. Establish a baseline before changing code: + - Prefer `pnpm test:perf:groups --full-suite --allow-failures --output ` + for full-suite ranking. + - For a scoped hotspot use: + `/usr/bin/time -l pnpm test --maxWorkers=1 --reporter=verbose` + - For import-heavy suspicion add: + `OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1`. +3. Separate wall/runner noise from real file cost: + - Compare Vitest duration, test body timing, import breakdown, wall time, and + max RSS. + - Re-run single files when grouped/full-suite numbers look stale or noisy. + - If a full-suite grouped run reports a lane failure but JSON says tests + passed, capture that as harness/noise and verify the suspect file directly. +4. Pick the next attack by return and risk: + - High return: one file/test dominates seconds or RSS and has a clear root. + - Lower risk: static descriptors, target parsing, routing, auth bypass, + setup hints, registry fixtures, or test server lifecycle. + - Higher risk: real memory/runtime behavior, live providers, protocol + contracts, or broad production refactors. +5. Fix the root cause, not the symptom: + - Move static metadata/parsing into narrow helpers or lightweight artifacts + reused by full runtime and fast paths. + - Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures, + and pure helpers over broad mocks. + - Reuse suite-level servers/clients when a fresh handshake is irrelevant. + - Keep schedulers/background loops off unless the test proves scheduling. +6. Preserve coverage shape: + - Do not delete a slow integration proof unless the exact production + composition is extracted into a named helper and tested. + - Keep one cheap integration smoke when cross-component wiring matters. + - State explicitly what incidental coverage was removed, if any. +7. Re-benchmark the same command after the change and compute seconds plus + percent gain. +8. Update the running report when requested or when this thread is tracking one. + Include before/after commands, artifacts, coverage notes, verification, and + next attack order. +9. Commit with `scripts/committer "" ` and push when the + user asked for commits/pushes. Stage only files touched for this attack. + +## Common Root Causes + +- Full bundled channel/plugin runtime loaded for static data. +- `getChannelPlugin()` fallback used when an already-loaded fixture or pure + parser would suffice. +- Broad `api.ts`, `runtime-api.ts`, `test-api.ts`, or plugin-sdk barrels pulled + into hot tests. +- Partial-real mocks using `importActual()` around broad modules. +- `vi.resetModules()` plus fresh imports in per-test loops. +- Test plugin registry seeded in `beforeAll` while runtime state resets in + `afterEach`. +- Per-test gateway/server/client startup when state reset would suffice. +- Runtime/default model/auth selection paid by idle snapshots or fixtures. +- Plugin-owned media/action discovery triggered before checking whether args + contain plugin-owned fields. + +## Benchmark Commands + +Scoped file: + +```bash +timeout 240 /usr/bin/time -l pnpm test --maxWorkers=1 --reporter=verbose +``` + +Scoped file with import breakdown: + +```bash +timeout 240 /usr/bin/time -l env \ + OPENCLAW_VITEST_IMPORT_DURATIONS=1 \ + OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \ + pnpm test --maxWorkers=1 --reporter=verbose +``` + +Grouped suite: + +```bash +pnpm test:perf:groups --full-suite --allow-failures \ + --output .artifacts/test-perf/.json +``` + +Reuse an existing Vitest JSON report: + +```bash +pnpm test:perf:groups --report \ + --output .artifacts/test-perf/.json +``` + +## Verification + +- Always run the targeted test surface that proves the change. +- Run `pnpm check` before commit unless the change is docs-only and the hook + handles it. +- Run `pnpm build` when touching lazy-loading, bundled artifacts, package + boundaries, dynamic imports, build output, or public surfaces. +- If deps are missing/stale, run `pnpm install` and retry the exact failed + command once. +- Use the report format: + +```markdown +| Metric | Before | After | Gain | +| -------------- | -----: | ----: | ------------: | +| File wall time | `Xs` | `Ys` | `-Zs` (`P%`) | +| Max RSS | `XMB` | `YMB` | `-ZMB` (`P%`) | +``` + +## Handoff + +Keep the final concise: + +- Root cause. +- Files changed. +- Before/after numbers. +- Coverage retained. +- Verification commands. +- Commit hash and push status. diff --git a/.agents/skills/openclaw-test-performance/agents/openai.yaml b/.agents/skills/openclaw-test-performance/agents/openai.yaml new file mode 100644 index 00000000000..ea780a1a545 --- /dev/null +++ b/.agents/skills/openclaw-test-performance/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "OpenClaw Test Performance" + short_description: "Benchmark and fix slow OpenClaw tests" + default_prompt: "Use $openclaw-test-performance to reassess the OpenClaw test benchmark, identify the next real hotspot, fix it without losing coverage, update the report, and commit scoped changes."