Docs: add test performance guardrails

This commit is contained in:
Gustavo Madeira Santana
2026-04-17 02:19:06 -04:00
parent e4c4f955b3
commit 89706d323c
7 changed files with 114 additions and 0 deletions

View File

@@ -87,6 +87,8 @@
- `src/plugin-sdk/AGENTS.md` expands public SDK contract rules.
- `src/plugins/AGENTS.md` expands plugin loading, registry, and manifest rules.
- `src/gateway/protocol/AGENTS.md` expands typed Gateway protocol rules.
- `src/gateway/AGENTS.md` expands Gateway server hot-path and plugin artifact rules.
- `src/agents/AGENTS.md` expands agent test/import performance rules.
- `test/helpers/AGENTS.md` and `test/helpers/channels/AGENTS.md` expand shared test helper boundary rules.
- Plugin architecture direction:
- Keep a manifest-first control plane: discovery, validation, enablement, setup hints, and activation planning should stay metadata-driven by default.
@@ -215,6 +217,8 @@
- Test performance guardrail: when production code already accepts `deps`, callbacks, or runtime injection, use that seam in tests before adding module-level mocks.
- Test performance guardrail: prefer narrow public SDK subpaths such as `models-provider-runtime`, `skill-commands-runtime`, and `reply-dispatch-runtime` over older broad helper barrels when both expose the needed helper.
- Test performance guardrail: treat import-dominated test time as a boundary bug. Refactor the import surface before adding more cases to the slow file.
- Test performance guardrail: when replacing a slow integration test with helper-level coverage, extract the exact production composition into a named helper and test that helper. Do not trade coverage shape for speed without preserving the behavior proof somewhere cheaper.
- Test performance guardrail: for plugin-owned static descriptors used by core tests or cold paths, prefer lightweight public artifacts with full-runtime fallback over loading broad bundled plugin barrels.
- Agents MUST NOT modify baseline, inventory, ignore, snapshot, or expected-failure files to silence failing checks without explicit approval in this chat.
- For targeted/local debugging, use the native root-project entrypoint: `pnpm test <path-or-filter> [vitest args...]` (for example `pnpm test src/commands/onboard-search.test.ts -t "shows registered plugin providers"`); do not default to raw `pnpm vitest run ...` because it bypasses the repo's default config/profile/pool routing.
- Do not set test workers above 16; tried already.

View File

@@ -60,6 +60,10 @@ third-party plugins see.
- Do not rely on eager global registry seeding or import-time side effects to
make a plugin “available”. Plugin availability should come from manifest
ownership plus targeted activation.
- When core needs plugin-owned static data on a hot path, expose a lightweight
top-level artifact such as `gateway-auth-api.ts`, `message-tool-api.ts`, or a
similarly narrow `*-api.ts`. Reuse the same local helper from the artifact and
the full plugin so fast paths do not drift from runtime behavior.
## Expanding The Boundary

29
src/agents/AGENTS.md Normal file
View File

@@ -0,0 +1,29 @@
# Agents Test Performance
Agent tests are often import-bound. Treat slow test files as architecture
signals, not just runner noise.
## Guardrails
- Benchmark before and after performance edits. Prefer existing grouped
artifacts when comparing suites, or use `/usr/bin/time -l pnpm test <file>`
for a scoped hotspot.
- If a test only needs schema, capability, routing, or static discovery data,
do not cold-load full bundled plugin/channel/provider runtime. Add or reuse a
lightweight typed artifact and keep full runtime as a fallback.
- Keep expensive bootstrap, embedded runner, provider, plugin, and channel
runtime work behind dependency injection or narrow helpers so tests can cover
behavior without starting the whole runtime.
- If moving coverage out of a slow integration test, preserve the exact
production composition in a named helper and test that helper. Do not remove
the behavior proof just because the old proof was slow.
- Avoid broad `importOriginal()` partial mocks and module resets in hot agent
tests. Use explicit mock factories, one-time imports, and reset only the
state the test mutates.
## Verification
- For agent performance changes, record seconds and RSS before/after in the
handoff or benchmark report.
- If the change touches lazy-loading, plugin runtime imports, or bundled
artifacts, run `pnpm build`.

View File

@@ -0,0 +1,26 @@
# Embedded Runner Test Performance
The embedded attempt runner is one of the most expensive agent test surfaces.
Use full-runner tests only when the behavior truly requires the runner.
## Guardrails
- Prefer focused helper tests for prompt assembly, runtime-context construction,
cache metadata, token accounting, and maintenance decision logic.
- Keep full `runEmbeddedAttempt` coverage for cross-component behavior that
cannot be proven through helpers, not for a single derived field.
- When extracting a helper from runner logic, make production call that helper
directly, then test the helper. Avoid test-only copies of runner behavior.
- Preserve context-engine coverage for `sessionKey`, `sessionFile`, token
budget, current token count, prompt cache, and routing fields when slimming
tests.
- Treat a standalone full-runner test above a few seconds as suspect. First ask
whether the proof can move to a production helper plus one cheap integration
smoke.
## Verification
- For runner test slimming, run the touched helper test and the nearest
two-file runner/context-engine surface.
- Record Vitest duration, wall time, and RSS when the change is performance
motivated.

View File

@@ -0,0 +1,25 @@
# Agent Tools Performance
Tool tests should not load full channel or plugin runtimes for static tool
descriptions.
## Guardrails
- Message-tool discovery should flow through shared discovery helpers and
lightweight channel artifacts before falling back to a full channel plugin
load.
- Channel-specific tool schemas, action lists, and static capabilities belong
in plugin-owned helpers that are reused by both the full plugin and the
lightweight artifact.
- Do not add direct bundled-plugin imports to agent tool tests for schema or
capability assertions. If the production path needs the same data, promote a
small public artifact instead.
- If a single assertion starts paying multi-second import/setup cost, split the
static descriptor path from runtime execution instead of adding more mocks
around the broad import.
## Verification
- For `src/agents/tools/*.test.ts` performance work, compare targeted file
runtime with `pnpm test <file>` before/after.
- Run `pnpm build` when adding or changing bundled plugin artifacts.

View File

@@ -29,6 +29,9 @@ import from this tree directly.
those files unless startup truly needs them.
- Prefer a small local seam such as `channel-api.ts`, `*.runtime.ts`, or
`*.runtime-api.ts` to keep heavy runtime code off the hot path.
- For core discovery paths used by Gateway or agent tools, prefer lightweight
bundled-plugin artifacts first and full channel plugin loading only as a
fallback.
- Do not mix static and dynamic imports for the same heavy module family across
a channel boundary change. If the path should stay lazy, keep it lazy end to
end.

23
src/gateway/AGENTS.md Normal file
View File

@@ -0,0 +1,23 @@
# Gateway Hot Paths
Gateway server tests and startup paths should not materialize bundled plugin
runtime when they only need plugin-owned static descriptors.
## Guardrails
- For plugin-owned Gateway behavior such as auth-bypass paths, prefer a
lightweight public artifact resolver before falling back to the full channel
plugin.
- Keep the full plugin contract and the lightweight artifact backed by the same
plugin-owned helper so behavior does not diverge.
- Do not load broad bundled channel registries from Gateway HTTP/server code
just to answer static questions.
- If adding a new plugin-owned Gateway descriptor, add the core resolver,
plugin artifact, and mirrored full-plugin export in the same change.
## Verification
- Benchmark the affected Gateway test file before/after with
`pnpm test <file>`.
- Run `pnpm build` when changing Gateway lazy-loading or bundled plugin
artifacts.