diff --git a/docs/help/testing.md b/docs/help/testing.md index 62cfda47a22..7932a1f244f 100644 --- a/docs/help/testing.md +++ b/docs/help/testing.md @@ -352,15 +352,15 @@ Run docs checks after doc edits: `pnpm docs:list`. These are “real pipeline” regressions without real providers: -- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.tool-calling.mock-openai.test.ts` -- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.wizard.e2e.test.ts` +- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.test.ts` (case: "runs a mock OpenAI tool call end-to-end via gateway agent loop") +- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.test.ts` (case: "runs wizard over ws and writes auth token config") ## Agent reliability evals (skills) We already have a few CI-safe tests that behave like “agent reliability evals”: -- Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.tool-calling.mock-openai.test.ts`). -- End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.wizard.e2e.test.ts`). +- Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.test.ts`). +- End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.test.ts`). What’s still missing for skills (see [Skills](/tools/skills)):