QA: organize scenarios by theme

This commit is contained in:
Gustavo Madeira Santana
2026-04-17 11:02:43 -04:00
parent a45ebf3281
commit 82fe6f50ef
57 changed files with 209 additions and 32 deletions

View File

@@ -120,7 +120,7 @@ can write back through the mounted workspace.
Seed assets live in `qa/`:
- `qa/scenarios/index.md`
- `qa/scenarios/*.md`
- `qa/scenarios/<theme>/*.md`
These are intentionally in git so the QA plan is visible to both humans and the
agent.
@@ -129,6 +129,7 @@ agent.
the source of truth for one test run and should define:
- scenario metadata
- optional category, capability, lane, and risk metadata
- docs and code refs
- optional plugin requirements
- optional gateway config patch
@@ -139,6 +140,10 @@ and cross-cutting. For example, markdown scenarios can combine transport-side
helpers with browser-side helpers that drive the embedded Control UI through the
Gateway `browser.request` seam without adding a special-case runner.
Scenario files should be grouped by product capability rather than source tree
folder. Keep scenario IDs stable when files move; use `docsRefs` and `codeRefs`
for implementation traceability.
The baseline list should stay broad enough to cover:
- DM and channel chat

View File

@@ -213,7 +213,7 @@ The minimum adoption bar for a new channel is:
4. Mount the runner as `openclaw qa <runner>` instead of registering a competing root command.
Runner plugins should declare `qaRunners` in `openclaw.plugin.json` and export a matching `qaRunnerCliRegistrations` array from `runtime-api.ts`.
Keep `runtime-api.ts` light; lazy CLI and runner execution should stay behind separate entrypoints.
5. Author or adapt markdown scenarios under `qa/scenarios/`.
5. Author or adapt markdown scenarios under the themed `qa/scenarios/` directories.
6. Use the generic scenario helpers for new scenarios.
7. Keep existing compatibility aliases working unless the repo is doing an intentional migration.

View File

@@ -18,7 +18,7 @@ The desired end state is a generic QA harness that loads powerful scenario defin
## Current State
Primary source of truth now lives in `qa/scenarios/index.md` plus one file per
scenario under `qa/scenarios/*.md`.
scenario under `qa/scenarios/<theme>/*.md`.
Implemented:
@@ -26,7 +26,7 @@ Implemented:
- canonical QA pack metadata
- operator identity
- kickoff mission
- `qa/scenarios/*.md`
- `qa/scenarios/<theme>/*.md`
- one markdown file per scenario
- scenario metadata
- handler bindings
@@ -107,8 +107,8 @@ These categories matter because they drive DSL requirements. A flat list of prom
### Single source of truth
Use `qa/scenarios/index.md` plus `qa/scenarios/*.md` as the authored source of
truth.
Use `qa/scenarios/index.md` plus `qa/scenarios/<theme>/*.md` as the authored
source of truth.
The pack should stay:
@@ -363,7 +363,7 @@ Generated compatibility:
Done.
- added `qa/scenarios/index.md`
- split scenarios into `qa/scenarios/*.md`
- split scenarios into `qa/scenarios/<theme>/*.md`
- added parser for named markdown YAML pack content
- validated with zod
- switched consumers to the parsed pack

View File

@@ -10,6 +10,12 @@ const QA_ALWAYS_STAGE_RUNTIME_PLUGIN_IDS = Object.freeze([
]);
const QA_OPENAI_PLUGIN_ID = "openai";
const QA_BUNDLED_PLUGIN_ID_PATTERN = /^[A-Za-z0-9][A-Za-z0-9._-]*$/;
const QA_CLI_METADATA_ENTRY_BASENAMES = Object.freeze([
"cli-metadata.ts",
"cli-metadata.js",
"cli-metadata.mjs",
"cli-metadata.cjs",
]);
function assertSafeQaBundledPluginId(pluginId: string) {
if (!QA_BUNDLED_PLUGIN_ID_PATTERN.test(pluginId)) {
@@ -69,12 +75,17 @@ export function resolveQaBundledPluginSourceDir(params: { repoRoot: string; plug
path.join(params.repoRoot, "dist-runtime", "extensions", params.pluginId),
path.join(params.repoRoot, "extensions", params.pluginId),
];
for (const candidate of candidates) {
if (existsSync(candidate)) {
return candidate;
}
const existingCandidates = candidates.filter((candidate) => existsSync(candidate));
if (existingCandidates.length === 0) {
return null;
}
return null;
const cliMetadataCandidate = existingCandidates.find((candidate) =>
QA_CLI_METADATA_ENTRY_BASENAMES.some((basename) => existsSync(path.join(candidate, basename))),
);
if (cliMetadataCandidate) {
return cliMetadataCandidate;
}
return existingCandidates[0] ?? null;
}
function resolveQaBundledPluginScanRoots(repoRoot: string) {

View File

@@ -714,6 +714,43 @@ describe("qa bundled plugin dir", () => {
).toBe(path.join(repoRoot, "extensions", "qa-channel"));
});
it("uses a source bundled plugin when the built copy is missing CLI metadata", async () => {
const repoRoot = await mkdtemp(path.join(os.tmpdir(), "qa-bundled-cli-metadata-root-"));
cleanups.push(async () => {
await rm(repoRoot, { recursive: true, force: true });
});
await mkdir(path.join(repoRoot, "dist", "extensions", "memory-core"), { recursive: true });
await writeFile(
path.join(repoRoot, "dist", "extensions", "memory-core", "package.json"),
"{}",
"utf8",
);
await writeFile(
path.join(repoRoot, "dist", "extensions", "memory-core", "openclaw.plugin.json"),
JSON.stringify({ id: "memory-core", kind: "memory" }),
"utf8",
);
await mkdir(path.join(repoRoot, "extensions", "memory-core"), { recursive: true });
await writeFile(path.join(repoRoot, "extensions", "memory-core", "package.json"), "{}", "utf8");
await writeFile(
path.join(repoRoot, "extensions", "memory-core", "openclaw.plugin.json"),
JSON.stringify({ id: "memory-core", kind: "memory" }),
"utf8",
);
await writeFile(
path.join(repoRoot, "extensions", "memory-core", "cli-metadata.ts"),
"export default { id: 'memory-core' };\n",
"utf8",
);
expect(
__testing.resolveQaBundledPluginSourceDir({
repoRoot,
pluginId: "memory-core",
}),
).toBe(path.join(repoRoot, "extensions", "memory-core"));
});
it("creates a scoped bundled plugin tree for allowed plugins plus always-allowed runtime facades", async () => {
const repoRoot = await mkdtemp(path.join(os.tmpdir(), "qa-bundled-scope-"));
cleanups.push(async () => {

View File

@@ -17,6 +17,9 @@ describe("qa scenario catalog", () => {
expect(pack.agent.identityMarkdown).toContain("Dev C-3PO");
expect(pack.kickoffTask).toContain("Lobster Invaders");
expect(listQaScenarioMarkdownPaths().length).toBe(pack.scenarios.length);
expect(listQaScenarioMarkdownPaths()).toContain(
"qa/scenarios/media/image-generation-roundtrip.md",
);
expect(pack.scenarios.some((scenario) => scenario.id === "image-generation-roundtrip")).toBe(
true,
);
@@ -112,7 +115,7 @@ describe("qa scenario catalog", () => {
(candidate) => candidate.id === "codex-harness-no-meta-leak",
);
expect(scenario?.sourcePath).toBe("qa/scenarios/codex-harness-no-meta-leak.md");
expect(scenario?.sourcePath).toBe("qa/scenarios/models/codex-harness-no-meta-leak.md");
expect(scenario?.execution.flow?.steps.map((step) => step.name)).toContain(
"keeps codex coordination chatter out of the visible reply",
);
@@ -135,7 +138,7 @@ describe("qa scenario catalog", () => {
}
| undefined;
expect(scenario.sourcePath).toBe(`qa/scenarios/${scenarioId}.md`);
expect(scenario.sourcePath).toBe(`qa/scenarios/runtime/${scenarioId}.md`);
expect(config?.requiredProvider).toBe("mock-openai");
expect(config?.prompt).toContain("check");
expect(scenario.execution.flow?.steps.length).toBeGreaterThan(0);

View File

@@ -137,6 +137,10 @@ const qaSeedScenarioSchema = z.object({
id: z.string().trim().min(1),
title: z.string().trim().min(1),
surface: z.string().trim().min(1),
category: z.string().trim().min(1).optional(),
capabilities: z.array(z.string().trim().min(1)).optional(),
lane: z.record(z.string(), z.union([z.boolean(), z.string()])).optional(),
riskLevel: z.string().trim().min(1).optional(),
objective: z.string().trim().min(1),
successCriteria: z.array(z.string().trim().min(1)).min(1),
plugins: z.array(z.string().trim().min(1)).optional(),
@@ -225,14 +229,6 @@ function readTextFile(relativePath: string): string {
return fs.readFileSync(resolved, "utf8");
}
function readDirEntries(relativePath: string): string[] {
const resolved = resolveRepoPath(relativePath, "directory");
if (!resolved) {
return [];
}
return fs.readdirSync(resolved);
}
function extractQaPackYaml(content: string) {
const match = content.match(QA_PACK_FENCE_RE);
if (!match?.[1]) {
@@ -324,6 +320,13 @@ export function readQaScenarioPack(): QaScenarioPack {
} satisfies QaSeedScenarioWithSource;
})(),
);
const seenScenarioIds = new Set<string>();
for (const scenario of scenarios) {
if (seenScenarioIds.has(scenario.id)) {
throw new Error(`duplicate qa scenario id: ${scenario.id}`);
}
seenScenarioIds.add(scenario.id);
}
return {
...parsedPack,
scenarios,
@@ -331,10 +334,37 @@ export function readQaScenarioPack(): QaScenarioPack {
}
export function listQaScenarioMarkdownPaths(): string[] {
return readDirEntries(QA_SCENARIO_DIR_PATH)
.filter((entry) => entry.endsWith(".md") && entry !== "index.md")
.map((entry) => `${QA_SCENARIO_DIR_PATH}/${entry}`)
.toSorted();
const resolved = resolveRepoPath(QA_SCENARIO_DIR_PATH, "directory");
if (!resolved) {
return [];
}
return listQaScenarioMarkdownPathsInDirectory(resolved, QA_SCENARIO_DIR_PATH).toSorted();
}
function listQaScenarioMarkdownPathsInDirectory(
absoluteDir: string,
relativeDir: string,
): string[] {
const paths: string[] = [];
const entries = fs
.readdirSync(absoluteDir, { withFileTypes: true })
.toSorted((left, right) => left.name.localeCompare(right.name));
for (const entry of entries) {
if (entry.name.startsWith(".")) {
continue;
}
const relativePath = `${relativeDir}/${entry.name}`;
if (entry.isDirectory()) {
paths.push(
...listQaScenarioMarkdownPathsInDirectory(path.join(absoluteDir, entry.name), relativePath),
);
continue;
}
if (entry.isFile() && entry.name.endsWith(".md") && entry.name !== "index.md") {
paths.push(relativePath);
}
}
return paths;
}
export function readQaScenarioOverviewMarkdown(): string {

View File

@@ -0,0 +1,74 @@
import { mkdir, mkdtemp, rm, writeFile } from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { afterEach, describe, expect, it } from "vitest";
import { runQaCli } from "./suite-runtime-agent-process.js";
const cleanups: Array<() => Promise<void>> = [];
afterEach(async () => {
while (cleanups.length > 0) {
await cleanups.pop()?.();
}
});
describe("qa suite runtime CLI integration", () => {
it("runs the plugin-owned memory status command with staged CLI metadata", async () => {
const repoRoot = await mkdtemp(path.join(os.tmpdir(), "qa-cli-memory-repo-"));
const tempRoot = await mkdtemp(path.join(os.tmpdir(), "qa-cli-memory-runtime-"));
cleanups.push(async () => {
await rm(repoRoot, { recursive: true, force: true });
await rm(tempRoot, { recursive: true, force: true });
});
const distDir = path.join(repoRoot, "dist");
const bundledPluginsDir = path.join(tempRoot, "dist", "extensions");
await mkdir(path.join(distDir), { recursive: true });
await mkdir(path.join(bundledPluginsDir, "memory-core"), { recursive: true });
await writeFile(
path.join(bundledPluginsDir, "memory-core", "cli-metadata.js"),
"export default { id: 'memory-core' };\n",
"utf8",
);
await writeFile(
path.join(distDir, "index.js"),
[
"import fs from 'node:fs';",
"import path from 'node:path';",
"const [command, subcommand] = process.argv.slice(2);",
"const metadataPath = path.join(process.env.OPENCLAW_BUNDLED_PLUGINS_DIR ?? '', 'memory-core', 'cli-metadata.js');",
"if (command === 'memory' && subcommand === 'status' && fs.existsSync(metadataPath)) {",
" console.log(JSON.stringify({ command, subcommand, status: 'ok' }));",
" process.exit(0);",
"}",
"console.error(\"error: unknown command 'memory'\");",
"process.exit(1);",
"",
].join("\n"),
"utf8",
);
await expect(
runQaCli(
{
repoRoot,
gateway: {
tempRoot,
runtimeEnv: {
...process.env,
OPENCLAW_BUNDLED_PLUGINS_DIR: bundledPluginsDir,
},
},
primaryModel: "openai/gpt-5.4",
alternateModel: "openai/gpt-5.4",
providerMode: "mock-openai",
} as never,
["memory", "status", "--json"],
{ json: true },
),
).resolves.toEqual({
command: "memory",
subcommand: "status",
status: "ok",
});
});
});

View File

@@ -4,7 +4,8 @@ Seed QA assets for the private `qa-lab` extension.
Files:
- `scenarios.md` - canonical QA scenario pack, kickoff mission, and operator identity.
- `scenarios/index.md` - canonical QA scenario pack, kickoff mission, and operator identity.
- `scenarios/<theme>/*.md` - one runnable scenario per markdown file.
- `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work.
- `convex-credential-broker/` - standalone Convex v1 lease broker for pooled live credentials.

View File

@@ -3,6 +3,6 @@
Canonical scenario source now lives in:
- `qa/scenarios/index.md`
- `qa/scenarios/*.md`
- `qa/scenarios/<theme>/*.md`
Each QA scenario has its own markdown file.

View File

@@ -23,7 +23,7 @@ execution:
prompt: |-
Subagent fanout synthesis check: delegate exactly two bounded subagents sequentially.
Subagent 1: verify that `HEARTBEAT.md` exists and report `ok` if it does.
Subagent 2: verify that `repo/qa/scenarios/subagent-fanout-synthesis.md` exists and report `ok` if it does.
Subagent 2: verify that `repo/qa/scenarios/agents/subagent-fanout-synthesis.md` exists and report `ok` if it does.
Wait for both subagents to finish.
Then reply with exactly these two lines and nothing else:
subagent-1: ok

View File

@@ -4,12 +4,28 @@ Single source of truth for repo-backed QA suite bootstrap data.
`qa-lab` should treat this directory as a generic markdown scenario pack:
- `index.md` defines pack-level bootstrap data
- each `*.md` scenario defines one runnable test via `qa-scenario` + `qa-flow`
- scenario markdown may also define required plugins and gateway config patching
- each nested `*.md` scenario defines one runnable test via `qa-scenario` + `qa-flow`
- scenario markdown may also define category metadata, required plugins, lane filters,
and gateway config patching
- kickoff mission
- QA operator identity
- scenario files under `./`
- scenario files under one-level theme directories
Theme directories:
- `agents/` - agent behavior, instructions, and subagent flows
- `channels/` - DM, shared channel, thread, and message-action behavior
- `character/` - persona and style eval scenarios
- `config/` - config patch, apply, and restart behavior
- `media/` - image understanding and generation
- `memory/` - recall, ranking, active memory, and thread isolation
- `models/` - provider capabilities and model switching
- `plugins/` - plugin, skill, and MCP tool integration
- `runtime/` - turn recovery, compaction, approval, and inventory behavior
- `scheduling/` - cron and recurring work
- `ui/` - Control UI plus qa-channel flows
- `workspace/` - repo-reading and workspace artifact tasks
```yaml qa-pack
version: 1