mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-12 01:31:08 +00:00
refactor: move qa suite definitions into markdown
This commit is contained in:
@@ -56,8 +56,7 @@ asset hash changes.
|
||||
|
||||
Seed assets live in `qa/`:
|
||||
|
||||
- `qa/QA_KICKOFF_TASK.md`
|
||||
- `qa/seed-scenarios.json`
|
||||
- `qa/scenarios.md`
|
||||
|
||||
These are intentionally in git so the QA plan is visible to both humans and the
|
||||
agent. The baseline list should stay broad enough to cover:
|
||||
|
||||
526
docs/refactor/qa.md
Normal file
526
docs/refactor/qa.md
Normal file
@@ -0,0 +1,526 @@
|
||||
# QA Refactor
|
||||
|
||||
Status: foundational migration landed.
|
||||
|
||||
## Goal
|
||||
|
||||
Move OpenClaw QA from a split-definition model to a single source of truth:
|
||||
|
||||
- scenario metadata
|
||||
- prompts sent to the model
|
||||
- setup and teardown
|
||||
- harness logic
|
||||
- assertions and success criteria
|
||||
- artifacts and report hints
|
||||
|
||||
The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript.
|
||||
|
||||
## Current State
|
||||
|
||||
Primary source of truth now lives in `qa/scenarios.md`.
|
||||
|
||||
Implemented:
|
||||
|
||||
- `qa/scenarios.md`
|
||||
- canonical QA pack
|
||||
- operator identity
|
||||
- kickoff mission
|
||||
- scenario metadata
|
||||
- handler bindings
|
||||
- `extensions/qa-lab/src/scenario-catalog.ts`
|
||||
- markdown pack parser + zod validation
|
||||
- `extensions/qa-lab/src/qa-agent-bootstrap.ts`
|
||||
- plan rendering from the markdown pack
|
||||
- `extensions/qa-lab/src/qa-agent-workspace.ts`
|
||||
- seeds generated compatibility files plus `QA_SCENARIOS.md`
|
||||
- `extensions/qa-lab/src/suite.ts`
|
||||
- selects executable scenarios through markdown-defined handler bindings
|
||||
- QA bus protocol + UI
|
||||
- generic inline attachments for image/video/audio/file rendering
|
||||
|
||||
Remaining split surfaces:
|
||||
|
||||
- `extensions/qa-lab/src/suite.ts`
|
||||
- still owns most executable custom handler logic
|
||||
- `extensions/qa-lab/src/report.ts`
|
||||
- still derives report structure from runtime outputs
|
||||
|
||||
So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative.
|
||||
|
||||
## What The Real Scenario Surface Looks Like
|
||||
|
||||
Reading the current suite shows a few distinct scenario classes.
|
||||
|
||||
### Simple interaction
|
||||
|
||||
- channel baseline
|
||||
- DM baseline
|
||||
- threaded follow-up
|
||||
- model switch
|
||||
- approval followthrough
|
||||
- reaction/edit/delete
|
||||
|
||||
### Config and runtime mutation
|
||||
|
||||
- config patch skill disable
|
||||
- config apply restart wake-up
|
||||
- config restart capability flip
|
||||
- runtime inventory drift check
|
||||
|
||||
### Filesystem and repo assertions
|
||||
|
||||
- source/docs discovery report
|
||||
- build Lobster Invaders
|
||||
- generated image artifact lookup
|
||||
|
||||
### Memory orchestration
|
||||
|
||||
- memory recall
|
||||
- memory tools in channel context
|
||||
- memory failure fallback
|
||||
- session memory ranking
|
||||
- thread memory isolation
|
||||
- memory dreaming sweep
|
||||
|
||||
### Tool and plugin integration
|
||||
|
||||
- MCP plugin-tools call
|
||||
- skill visibility
|
||||
- skill hot install
|
||||
- native image generation
|
||||
- image roundtrip
|
||||
- image understanding from attachment
|
||||
|
||||
### Multi-turn and multi-actor
|
||||
|
||||
- subagent handoff
|
||||
- subagent fanout synthesis
|
||||
- restart recovery style flows
|
||||
|
||||
These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough.
|
||||
|
||||
## Direction
|
||||
|
||||
### Single source of truth
|
||||
|
||||
Use `qa/scenarios.md` as the authored source of truth.
|
||||
|
||||
The pack should stay:
|
||||
|
||||
- human-readable in review
|
||||
- machine-parseable
|
||||
- rich enough to drive:
|
||||
- suite execution
|
||||
- QA workspace bootstrap
|
||||
- QA Lab UI metadata
|
||||
- docs/discovery prompts
|
||||
- report generation
|
||||
|
||||
### Preferred authoring format
|
||||
|
||||
Use markdown as the top-level format, with structured YAML inside it.
|
||||
|
||||
Recommended shape:
|
||||
|
||||
- YAML frontmatter
|
||||
- id
|
||||
- title
|
||||
- surface
|
||||
- tags
|
||||
- docs refs
|
||||
- code refs
|
||||
- model/provider overrides
|
||||
- prerequisites
|
||||
- prose sections
|
||||
- objective
|
||||
- notes
|
||||
- debugging hints
|
||||
- fenced YAML blocks
|
||||
- setup
|
||||
- steps
|
||||
- assertions
|
||||
- cleanup
|
||||
|
||||
This gives:
|
||||
|
||||
- better PR readability than giant JSON
|
||||
- richer context than pure YAML
|
||||
- strict parsing and zod validation
|
||||
|
||||
Raw JSON is acceptable only as an intermediate generated form.
|
||||
|
||||
## Proposed Scenario File Shape
|
||||
|
||||
Example:
|
||||
|
||||
````md
|
||||
---
|
||||
id: image-generation-roundtrip
|
||||
title: Image generation roundtrip
|
||||
surface: image
|
||||
tags: [media, image, roundtrip]
|
||||
models:
|
||||
primary: openai/gpt-5.4
|
||||
requires:
|
||||
tools: [image_generate]
|
||||
plugins: [openai, qa-channel]
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/concepts/model-providers.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
- src/gateway/chat-attachments.ts
|
||||
---
|
||||
|
||||
# Objective
|
||||
|
||||
Verify generated media is reattached on the follow-up turn.
|
||||
|
||||
# Setup
|
||||
|
||||
```yaml scenario.setup
|
||||
- action: config.patch
|
||||
patch:
|
||||
agents:
|
||||
defaults:
|
||||
imageGenerationModel:
|
||||
primary: openai/gpt-image-1
|
||||
- action: session.create
|
||||
key: agent:qa:image-roundtrip
|
||||
```
|
||||
````
|
||||
|
||||
# Steps
|
||||
|
||||
```yaml scenario.steps
|
||||
- action: agent.send
|
||||
session: agent:qa:image-roundtrip
|
||||
message: |
|
||||
Image generation check: generate a QA lighthouse image and summarize it in one short sentence.
|
||||
- action: artifact.capture
|
||||
kind: generated-image
|
||||
promptSnippet: Image generation check
|
||||
saveAs: lighthouseImage
|
||||
- action: agent.send
|
||||
session: agent:qa:image-roundtrip
|
||||
message: |
|
||||
Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence.
|
||||
attachments:
|
||||
- fromArtifact: lighthouseImage
|
||||
```
|
||||
|
||||
# Expect
|
||||
|
||||
```yaml scenario.expect
|
||||
- assert: outbound.textIncludes
|
||||
value: lighthouse
|
||||
- assert: requestLog.matches
|
||||
where:
|
||||
promptIncludes: Roundtrip image inspection check
|
||||
imageInputCountGte: 1
|
||||
- assert: artifact.exists
|
||||
ref: lighthouseImage
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
## Runner Capabilities The DSL Must Cover
|
||||
|
||||
Based on the current suite, the generic runner needs more than prompt execution.
|
||||
|
||||
### Environment and setup actions
|
||||
|
||||
- `bus.reset`
|
||||
- `gateway.waitHealthy`
|
||||
- `channel.waitReady`
|
||||
- `session.create`
|
||||
- `thread.create`
|
||||
- `workspace.writeSkill`
|
||||
|
||||
### Agent turn actions
|
||||
|
||||
- `agent.send`
|
||||
- `agent.wait`
|
||||
- `bus.injectInbound`
|
||||
- `bus.injectOutbound`
|
||||
|
||||
### Config and runtime actions
|
||||
|
||||
- `config.get`
|
||||
- `config.patch`
|
||||
- `config.apply`
|
||||
- `gateway.restart`
|
||||
- `tools.effective`
|
||||
- `skills.status`
|
||||
|
||||
### File and artifact actions
|
||||
|
||||
- `file.write`
|
||||
- `file.read`
|
||||
- `file.delete`
|
||||
- `file.touchTime`
|
||||
- `artifact.captureGeneratedImage`
|
||||
- `artifact.capturePath`
|
||||
|
||||
### Memory and cron actions
|
||||
|
||||
- `memory.indexForce`
|
||||
- `memory.searchCli`
|
||||
- `doctor.memory.status`
|
||||
- `cron.list`
|
||||
- `cron.run`
|
||||
- `cron.waitCompletion`
|
||||
- `sessionTranscript.write`
|
||||
|
||||
### MCP actions
|
||||
|
||||
- `mcp.callTool`
|
||||
|
||||
### Assertions
|
||||
|
||||
- `outbound.textIncludes`
|
||||
- `outbound.inThread`
|
||||
- `outbound.notInRoot`
|
||||
- `tool.called`
|
||||
- `tool.notPresent`
|
||||
- `skill.visible`
|
||||
- `skill.disabled`
|
||||
- `file.contains`
|
||||
- `memory.contains`
|
||||
- `requestLog.matches`
|
||||
- `sessionStore.matches`
|
||||
- `cron.managedPresent`
|
||||
- `artifact.exists`
|
||||
|
||||
## Variables and Artifact References
|
||||
|
||||
The DSL must support saved outputs and later references.
|
||||
|
||||
Examples from the current suite:
|
||||
|
||||
- create a thread, then reuse `threadId`
|
||||
- create a session, then reuse `sessionKey`
|
||||
- generate an image, then attach the file on the next turn
|
||||
- generate a wake marker string, then assert that it appears later
|
||||
|
||||
Needed capabilities:
|
||||
|
||||
- `saveAs`
|
||||
- `${vars.name}`
|
||||
- `${artifacts.name}`
|
||||
- typed references for paths, session keys, thread ids, markers, tool outputs
|
||||
|
||||
Without variable support, the harness will keep leaking scenario logic back into TypeScript.
|
||||
|
||||
## What Should Stay As Escape Hatches
|
||||
|
||||
A fully pure declarative runner is not realistic in phase 1.
|
||||
|
||||
Some scenarios are inherently orchestration-heavy:
|
||||
|
||||
- memory dreaming sweep
|
||||
- config apply restart wake-up
|
||||
- config restart capability flip
|
||||
- generated image artifact resolution by timestamp/path
|
||||
- discovery-report evaluation
|
||||
|
||||
These should use explicit custom handlers for now.
|
||||
|
||||
Recommended rule:
|
||||
|
||||
- 85-90% declarative
|
||||
- explicit `customHandler` steps for the hard remainder
|
||||
- named and documented custom handlers only
|
||||
- no anonymous inline code in the scenario file
|
||||
|
||||
That keeps the generic engine clean while still allowing progress.
|
||||
|
||||
## Architecture Change
|
||||
|
||||
### Current
|
||||
|
||||
Scenario markdown already is the source of truth for:
|
||||
|
||||
- suite execution
|
||||
- workspace bootstrap files
|
||||
- QA Lab UI scenario catalog
|
||||
- report metadata
|
||||
- discovery prompts
|
||||
|
||||
Generated compatibility:
|
||||
|
||||
- seeded workspace still includes `QA_KICKOFF_TASK.md`
|
||||
- seeded workspace still includes `QA_SCENARIO_PLAN.md`
|
||||
- seeded workspace now also includes `QA_SCENARIOS.md`
|
||||
|
||||
## Refactor Plan
|
||||
|
||||
### Phase 1: loader and schema
|
||||
|
||||
Done.
|
||||
|
||||
- added `qa/scenarios.md`
|
||||
- added parser for named markdown YAML pack content
|
||||
- validated with zod
|
||||
- switched consumers to the parsed pack
|
||||
- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md`
|
||||
|
||||
### Phase 2: generic engine
|
||||
|
||||
- split `extensions/qa-lab/src/suite.ts` into:
|
||||
- loader
|
||||
- engine
|
||||
- action registry
|
||||
- assertion registry
|
||||
- custom handlers
|
||||
- keep existing helper functions as engine operations
|
||||
|
||||
Deliverable:
|
||||
|
||||
- engine executes simple declarative scenarios
|
||||
|
||||
Start with scenarios that are mostly prompt + wait + assert:
|
||||
|
||||
- threaded follow-up
|
||||
- image understanding from attachment
|
||||
- skill visibility and invocation
|
||||
- channel baseline
|
||||
|
||||
Deliverable:
|
||||
|
||||
- first real markdown-defined scenarios shipping through the generic engine
|
||||
|
||||
### Phase 4: migrate medium scenarios
|
||||
|
||||
- image generation roundtrip
|
||||
- memory tools in channel context
|
||||
- session memory ranking
|
||||
- subagent handoff
|
||||
- subagent fanout synthesis
|
||||
|
||||
Deliverable:
|
||||
|
||||
- variables, artifacts, tool assertions, request-log assertions proven out
|
||||
|
||||
### Phase 5: keep hard scenarios on custom handlers
|
||||
|
||||
- memory dreaming sweep
|
||||
- config apply restart wake-up
|
||||
- config restart capability flip
|
||||
- runtime inventory drift
|
||||
|
||||
Deliverable:
|
||||
|
||||
- same authoring format, but with explicit custom-step blocks where needed
|
||||
|
||||
### Phase 6: delete hardcoded scenario map
|
||||
|
||||
Once the pack coverage is good enough:
|
||||
|
||||
- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts`
|
||||
|
||||
## Fake Slack / Rich Media Support
|
||||
|
||||
The current QA bus is text-first.
|
||||
|
||||
Relevant files:
|
||||
|
||||
- `extensions/qa-channel/src/protocol.ts`
|
||||
- `extensions/qa-lab/src/bus-state.ts`
|
||||
- `extensions/qa-lab/src/bus-queries.ts`
|
||||
- `extensions/qa-lab/src/bus-server.ts`
|
||||
- `extensions/qa-lab/web/src/ui-render.ts`
|
||||
|
||||
Today the QA bus supports:
|
||||
|
||||
- text
|
||||
- reactions
|
||||
- threads
|
||||
|
||||
It does not yet model inline media attachments.
|
||||
|
||||
### Needed transport contract
|
||||
|
||||
Add a generic QA bus attachment model:
|
||||
|
||||
```ts
|
||||
type QaBusAttachment = {
|
||||
id: string;
|
||||
kind: "image" | "video" | "audio" | "file";
|
||||
mimeType: string;
|
||||
fileName?: string;
|
||||
inline?: boolean;
|
||||
url?: string;
|
||||
contentBase64?: string;
|
||||
width?: number;
|
||||
height?: number;
|
||||
durationMs?: number;
|
||||
altText?: string;
|
||||
transcript?: string;
|
||||
};
|
||||
````
|
||||
|
||||
Then add `attachments?: QaBusAttachment[]` to:
|
||||
|
||||
- `QaBusMessage`
|
||||
- `QaBusInboundMessageInput`
|
||||
- `QaBusOutboundMessageInput`
|
||||
|
||||
### Why generic first
|
||||
|
||||
Do not build a Slack-only media model.
|
||||
|
||||
Instead:
|
||||
|
||||
- one generic QA transport model
|
||||
- multiple renderers on top of it
|
||||
- current QA Lab chat
|
||||
- future fake Slack web
|
||||
- any other fake transport views
|
||||
|
||||
This prevents duplicate logic and lets media scenarios stay transport-agnostic.
|
||||
|
||||
### UI work needed
|
||||
|
||||
Update the QA UI to render:
|
||||
|
||||
- inline image preview
|
||||
- inline audio player
|
||||
- inline video player
|
||||
- file attachment chip
|
||||
|
||||
The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model.
|
||||
|
||||
### Scenario work enabled by media transport
|
||||
|
||||
Once attachments flow through QA bus, we can add richer fake-chat scenarios:
|
||||
|
||||
- inline image reply in fake Slack
|
||||
- audio attachment understanding
|
||||
- video attachment understanding
|
||||
- mixed attachment ordering
|
||||
- thread reply with media retained
|
||||
|
||||
## Recommendation
|
||||
|
||||
The next implementation chunk should be:
|
||||
|
||||
1. add markdown scenario loader + zod schema
|
||||
2. generate the current catalog from markdown
|
||||
3. migrate a few simple scenarios first
|
||||
4. add generic QA bus attachment support
|
||||
5. render inline image in the QA UI
|
||||
6. then expand to audio and video
|
||||
|
||||
This is the smallest path that proves both goals:
|
||||
|
||||
- generic markdown-defined QA
|
||||
- richer fake messaging surfaces
|
||||
|
||||
## Open Questions
|
||||
|
||||
- whether scenario files should allow embedded markdown prompt templates with variable interpolation
|
||||
- whether setup/cleanup should be named sections or just ordered action lists
|
||||
- whether artifact references should be strongly typed in schema or string-based
|
||||
- whether custom handlers should live in one registry or per-surface registries
|
||||
- whether the generated JSON compatibility file should remain checked in during migration
|
||||
@@ -10,6 +10,7 @@ import type {
|
||||
} from "./protocol.js";
|
||||
|
||||
export type {
|
||||
QaBusAttachment,
|
||||
QaBusConversation,
|
||||
QaBusConversationKind,
|
||||
QaBusCreateThreadInput,
|
||||
@@ -140,6 +141,7 @@ export async function sendQaBusMessage(params: {
|
||||
senderName?: string;
|
||||
threadId?: string;
|
||||
replyToId?: string;
|
||||
attachments?: import("./protocol.js").QaBusAttachment[];
|
||||
}) {
|
||||
return await postJson<{ message: QaBusMessage }>(params.baseUrl, "/v1/outbound/message", params);
|
||||
}
|
||||
|
||||
@@ -6,6 +6,21 @@ export type QaBusConversation = {
|
||||
title?: string;
|
||||
};
|
||||
|
||||
export type QaBusAttachment = {
|
||||
id: string;
|
||||
kind: "image" | "video" | "audio" | "file";
|
||||
mimeType: string;
|
||||
fileName?: string;
|
||||
inline?: boolean;
|
||||
url?: string;
|
||||
contentBase64?: string;
|
||||
width?: number;
|
||||
height?: number;
|
||||
durationMs?: number;
|
||||
altText?: string;
|
||||
transcript?: string;
|
||||
};
|
||||
|
||||
export type QaBusMessage = {
|
||||
id: string;
|
||||
accountId: string;
|
||||
@@ -20,6 +35,7 @@ export type QaBusMessage = {
|
||||
replyToId?: string;
|
||||
deleted?: boolean;
|
||||
editedAt?: number;
|
||||
attachments?: QaBusAttachment[];
|
||||
reactions: Array<{
|
||||
emoji: string;
|
||||
senderId: string;
|
||||
@@ -86,6 +102,7 @@ export type QaBusInboundMessageInput = {
|
||||
threadId?: string;
|
||||
threadTitle?: string;
|
||||
replyToId?: string;
|
||||
attachments?: QaBusAttachment[];
|
||||
};
|
||||
|
||||
export type QaBusOutboundMessageInput = {
|
||||
@@ -97,6 +114,7 @@ export type QaBusOutboundMessageInput = {
|
||||
timestamp?: number;
|
||||
threadId?: string;
|
||||
replyToId?: string;
|
||||
attachments?: QaBusAttachment[];
|
||||
};
|
||||
|
||||
export type QaBusCreateThreadInput = {
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/text-runtime";
|
||||
import type {
|
||||
QaBusAttachment,
|
||||
QaBusConversation,
|
||||
QaBusEvent,
|
||||
QaBusMessage,
|
||||
@@ -52,10 +53,15 @@ export function cloneMessage(message: QaBusMessage): QaBusMessage {
|
||||
return {
|
||||
...message,
|
||||
conversation: { ...message.conversation },
|
||||
attachments: (message.attachments ?? []).map((attachment) => cloneAttachment(attachment)),
|
||||
reactions: message.reactions.map((reaction) => ({ ...reaction })),
|
||||
};
|
||||
}
|
||||
|
||||
function cloneAttachment(attachment: QaBusAttachment): QaBusAttachment {
|
||||
return { ...attachment };
|
||||
}
|
||||
|
||||
export function cloneEvent(event: QaBusEvent): QaBusEvent {
|
||||
switch (event.kind) {
|
||||
case "inbound-message":
|
||||
@@ -113,9 +119,24 @@ export function searchQaBusMessages(params: {
|
||||
.filter((message) =>
|
||||
params.input.threadId ? message.threadId === params.input.threadId : true,
|
||||
)
|
||||
.filter((message) =>
|
||||
query ? normalizeOptionalLowercaseString(message.text)?.includes(query) === true : true,
|
||||
)
|
||||
.filter((message) => {
|
||||
if (!query) {
|
||||
return true;
|
||||
}
|
||||
const attachmentHaystack = message.attachments ?? [];
|
||||
const searchableAttachmentText = attachmentHaystack
|
||||
.flatMap((attachment) => [
|
||||
attachment.fileName,
|
||||
attachment.altText,
|
||||
attachment.transcript,
|
||||
attachment.mimeType,
|
||||
])
|
||||
.filter((value): value is string => Boolean(value))
|
||||
.join(" ")
|
||||
.toLowerCase();
|
||||
const messageText = normalizeOptionalLowercaseString(message.text) ?? "";
|
||||
return `${messageText} ${searchableAttachmentText}`.includes(query);
|
||||
})
|
||||
.slice(-limit)
|
||||
.map((message) => cloneMessage(message));
|
||||
}
|
||||
|
||||
@@ -91,4 +91,41 @@ describe("qa-bus state", () => {
|
||||
}),
|
||||
).rejects.toThrow("qa-bus wait timeout");
|
||||
});
|
||||
|
||||
it("preserves inline attachments and lets search match attachment metadata", () => {
|
||||
const state = createQaBusState();
|
||||
|
||||
const outbound = state.addOutboundMessage({
|
||||
to: "dm:alice",
|
||||
text: "artifact attached",
|
||||
attachments: [
|
||||
{
|
||||
id: "image-1",
|
||||
kind: "image",
|
||||
mimeType: "image/png",
|
||||
fileName: "qa-screenshot.png",
|
||||
altText: "QA dashboard screenshot",
|
||||
contentBase64: "aGVsbG8=",
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const readback = state.readMessage({ messageId: outbound.id });
|
||||
expect(readback.attachments).toHaveLength(1);
|
||||
expect(readback.attachments?.[0]).toMatchObject({
|
||||
kind: "image",
|
||||
fileName: "qa-screenshot.png",
|
||||
altText: "QA dashboard screenshot",
|
||||
});
|
||||
|
||||
const byFilename = state.searchMessages({
|
||||
query: "screenshot",
|
||||
});
|
||||
expect(byFilename.some((message) => message.id === outbound.id)).toBe(true);
|
||||
|
||||
const byAltText = state.searchMessages({
|
||||
query: "dashboard",
|
||||
});
|
||||
expect(byAltText.some((message) => message.id === outbound.id)).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -10,6 +10,7 @@ import {
|
||||
} from "./bus-queries.js";
|
||||
import { createQaBusWaiterStore } from "./bus-waiters.js";
|
||||
import type {
|
||||
QaBusAttachment,
|
||||
QaBusConversation,
|
||||
QaBusCreateThreadInput,
|
||||
QaBusDeleteMessageInput,
|
||||
@@ -86,6 +87,7 @@ export function createQaBusState() {
|
||||
threadId?: string;
|
||||
threadTitle?: string;
|
||||
replyToId?: string;
|
||||
attachments?: QaBusAttachment[];
|
||||
}): QaBusMessage => {
|
||||
const conversation = ensureConversation(params.conversation);
|
||||
const message: QaBusMessage = {
|
||||
@@ -100,6 +102,7 @@ export function createQaBusState() {
|
||||
threadId: params.threadId,
|
||||
threadTitle: params.threadTitle,
|
||||
replyToId: params.replyToId,
|
||||
attachments: params.attachments?.map((attachment) => ({ ...attachment })) ?? [],
|
||||
reactions: [],
|
||||
};
|
||||
messages.set(message.id, message);
|
||||
@@ -138,6 +141,7 @@ export function createQaBusState() {
|
||||
threadId: input.threadId,
|
||||
threadTitle: input.threadTitle,
|
||||
replyToId: input.replyToId,
|
||||
attachments: input.attachments,
|
||||
});
|
||||
pushEvent({
|
||||
kind: "inbound-message",
|
||||
@@ -159,6 +163,7 @@ export function createQaBusState() {
|
||||
timestamp: input.timestamp,
|
||||
threadId: input.threadId ?? threadId,
|
||||
replyToId: input.replyToId,
|
||||
attachments: input.attachments,
|
||||
});
|
||||
pushEvent({
|
||||
kind: "outbound-message",
|
||||
|
||||
@@ -9,7 +9,7 @@ describe("qa discovery evaluation", () => {
|
||||
it("accepts rich discovery reports that explicitly confirm all required files were read", () => {
|
||||
const report = `
|
||||
Worked
|
||||
- Read all four requested files: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
|
||||
- Read all three requested files: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
|
||||
Failed
|
||||
- None.
|
||||
Blocked
|
||||
@@ -28,8 +28,8 @@ The helper text mentions banned phrases like "not present", "missing files", "bl
|
||||
it("accepts numeric 'all 4 required files read' confirmations", () => {
|
||||
const report = `
|
||||
Worked
|
||||
- Source: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
|
||||
- all 4 required files read.
|
||||
- Source: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
|
||||
- all 3 required files read.
|
||||
Failed
|
||||
- None.
|
||||
Blocked
|
||||
@@ -48,8 +48,8 @@ The report may quote phrases like "not present" while describing the evaluator,
|
||||
it("accepts claude-style 'all four files retrieved' discovery summaries", () => {
|
||||
const report = `
|
||||
Worked
|
||||
- All four files retrieved. Now let me compile the protocol report.
|
||||
- All four mandated files read successfully: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
|
||||
- All three files retrieved. Now let me compile the protocol report.
|
||||
- All three mandated files read successfully: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
|
||||
Failed
|
||||
- None.
|
||||
Blocked
|
||||
@@ -83,7 +83,7 @@ Follow-up
|
||||
it("flags discovery replies that drift into unrelated suite wrap-up claims", () => {
|
||||
const report = `
|
||||
Worked
|
||||
- All four requested files were read: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
|
||||
- All three requested files were read: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
|
||||
Failed
|
||||
- None.
|
||||
Blocked
|
||||
|
||||
@@ -1,8 +1,7 @@
|
||||
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
|
||||
|
||||
const REQUIRED_DISCOVERY_REFS = [
|
||||
"repo/qa/seed-scenarios.json",
|
||||
"repo/qa/QA_KICKOFF_TASK.md",
|
||||
"repo/qa/scenarios.md",
|
||||
"repo/extensions/qa-lab/src/suite.ts",
|
||||
"repo/docs/help/testing.md",
|
||||
] as const;
|
||||
@@ -21,14 +20,15 @@ const DISCOVERY_SCOPE_LEAK_PHRASES = [
|
||||
function confirmsDiscoveryFileRead(text: string) {
|
||||
const lower = normalizeLowercaseStringOrEmpty(text);
|
||||
const mentionsAllRefs = REQUIRED_DISCOVERY_REFS_LOWER.every((ref) => lower.includes(ref));
|
||||
const requiredCountPattern = "(?:three|3|four|4)";
|
||||
const confirmsRead =
|
||||
/(?:read|retrieved|inspected|loaded|accessed|digested)\s+all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files/.test(
|
||||
lower,
|
||||
) ||
|
||||
/all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files\s+(?:were\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\s+\w+)?/.test(
|
||||
lower,
|
||||
) ||
|
||||
/all (?:four|4) seeded files readable/.test(lower);
|
||||
new RegExp(
|
||||
`(?:read|retrieved|inspected|loaded|accessed|digested)\\s+all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files`,
|
||||
).test(lower) ||
|
||||
new RegExp(
|
||||
`all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files\\s+(?:were\\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\\s+\\w+)?`,
|
||||
).test(lower) ||
|
||||
new RegExp(`all\\s+${requiredCountPattern}\\s+seeded files readable`).test(lower);
|
||||
return mentionsAllRefs && confirmsRead;
|
||||
}
|
||||
|
||||
|
||||
@@ -38,6 +38,7 @@ describe("qa docker harness", () => {
|
||||
path.join(outputDir, "state", "openclaw.json"),
|
||||
path.join(outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
|
||||
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
|
||||
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
|
||||
path.join(outputDir, "state", "seed-workspace", "IDENTITY.md"),
|
||||
]),
|
||||
);
|
||||
@@ -86,6 +87,13 @@ describe("qa docker harness", () => {
|
||||
);
|
||||
expect(kickoff).toContain("Lobster Invaders");
|
||||
|
||||
const scenarios = await readFile(
|
||||
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
|
||||
"utf8",
|
||||
);
|
||||
expect(scenarios).toContain("```yaml qa-pack");
|
||||
expect(scenarios).toContain("subagent-fanout-synthesis");
|
||||
|
||||
const readme = await readFile(path.join(outputDir, "README.md"), "utf8");
|
||||
expect(readme).toContain("in-process restarts inside Docker");
|
||||
expect(readme).toContain("pnpm qa:lab:watch");
|
||||
|
||||
@@ -323,6 +323,7 @@ export async function writeQaDockerHarnessFiles(params: {
|
||||
path.join(params.outputDir, "state", "seed-workspace", "IDENTITY.md"),
|
||||
path.join(params.outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
|
||||
path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
|
||||
path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1,22 +1,13 @@
|
||||
import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
|
||||
import {
|
||||
DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
|
||||
readQaBootstrapScenarioCatalog,
|
||||
} from "./scenario-catalog.js";
|
||||
|
||||
export const QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
|
||||
|
||||
You are the OpenClaw QA operator agent.
|
||||
|
||||
Persona:
|
||||
- protocol-minded
|
||||
- precise
|
||||
- a little flustered
|
||||
- conscientious
|
||||
- eager to report what worked, failed, or remains blocked
|
||||
|
||||
Style:
|
||||
- read source and docs first
|
||||
- test systematically
|
||||
- record evidence
|
||||
- end with a concise protocol report
|
||||
`;
|
||||
export function readQaAgentIdentityMarkdown(): string {
|
||||
return (
|
||||
readQaBootstrapScenarioCatalog().agentIdentityMarkdown || DEFAULT_QA_AGENT_IDENTITY_MARKDOWN
|
||||
);
|
||||
}
|
||||
|
||||
export function buildQaScenarioPlanMarkdown(): string {
|
||||
const catalog = readQaBootstrapScenarioCatalog();
|
||||
@@ -27,6 +18,9 @@ export function buildQaScenarioPlanMarkdown(): string {
|
||||
lines.push(`- id: ${scenario.id}`);
|
||||
lines.push(`- surface: ${scenario.surface}`);
|
||||
lines.push(`- objective: ${scenario.objective}`);
|
||||
if (scenario.execution?.summary) {
|
||||
lines.push(`- execution: ${scenario.execution.summary}`);
|
||||
}
|
||||
lines.push("- success criteria:");
|
||||
for (const criterion of scenario.successCriteria) {
|
||||
lines.push(` - ${criterion}`);
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import fs from "node:fs/promises";
|
||||
import path from "node:path";
|
||||
import { buildQaScenarioPlanMarkdown, QA_AGENT_IDENTITY_MARKDOWN } from "./qa-agent-bootstrap.js";
|
||||
import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
|
||||
import { buildQaScenarioPlanMarkdown, readQaAgentIdentityMarkdown } from "./qa-agent-bootstrap.js";
|
||||
import { readQaBootstrapScenarioCatalog, readQaScenarioPackMarkdown } from "./scenario-catalog.js";
|
||||
|
||||
export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoRoot?: string }) {
|
||||
const catalog = readQaBootstrapScenarioCatalog();
|
||||
@@ -9,9 +9,10 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR
|
||||
|
||||
const kickoffTask = catalog.kickoffTask || "QA mission unavailable.";
|
||||
const files = new Map<string, string>([
|
||||
["IDENTITY.md", QA_AGENT_IDENTITY_MARKDOWN],
|
||||
["IDENTITY.md", readQaAgentIdentityMarkdown()],
|
||||
["QA_KICKOFF_TASK.md", kickoffTask],
|
||||
["QA_SCENARIO_PLAN.md", buildQaScenarioPlanMarkdown()],
|
||||
["QA_SCENARIOS.md", readQaScenarioPackMarkdown()],
|
||||
]);
|
||||
|
||||
if (params.repoRoot) {
|
||||
@@ -22,6 +23,7 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR
|
||||
- repo: ./repo/
|
||||
- kickoff: ./QA_KICKOFF_TASK.md
|
||||
- scenario plan: ./QA_SCENARIO_PLAN.md
|
||||
- scenario pack: ./QA_SCENARIOS.md
|
||||
- identity: ./IDENTITY.md
|
||||
|
||||
The mounted repo source should be available read-only under \`./repo/\`.
|
||||
|
||||
@@ -20,6 +20,7 @@ export {
|
||||
setQaChannelRuntime,
|
||||
} from "openclaw/plugin-sdk/qa-channel";
|
||||
export type {
|
||||
QaBusAttachment,
|
||||
QaBusConversation,
|
||||
QaBusCreateThreadInput,
|
||||
QaBusDeleteMessageInput,
|
||||
|
||||
26
extensions/qa-lab/src/scenario-catalog.test.ts
Normal file
26
extensions/qa-lab/src/scenario-catalog.test.ts
Normal file
@@ -0,0 +1,26 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { readQaBootstrapScenarioCatalog, readQaScenarioPack } from "./scenario-catalog.js";
|
||||
|
||||
describe("qa scenario catalog", () => {
|
||||
it("loads the markdown pack as the canonical source of truth", () => {
|
||||
const pack = readQaScenarioPack();
|
||||
|
||||
expect(pack.version).toBe(1);
|
||||
expect(pack.agent.identityMarkdown).toContain("Dev C-3PO");
|
||||
expect(pack.kickoffTask).toContain("Lobster Invaders");
|
||||
expect(pack.scenarios.some((scenario) => scenario.id === "image-generation-roundtrip")).toBe(
|
||||
true,
|
||||
);
|
||||
expect(pack.scenarios.every((scenario) => scenario.execution?.kind === "custom")).toBe(true);
|
||||
});
|
||||
|
||||
it("exposes bootstrap data from the markdown pack", () => {
|
||||
const catalog = readQaBootstrapScenarioCatalog();
|
||||
|
||||
expect(catalog.agentIdentityMarkdown).toContain("protocol-minded");
|
||||
expect(catalog.kickoffTask).toContain("Track what worked");
|
||||
expect(catalog.scenarios.some((scenario) => scenario.id === "subagent-fanout-synthesis")).toBe(
|
||||
true,
|
||||
);
|
||||
});
|
||||
});
|
||||
@@ -1,21 +1,68 @@
|
||||
import fs from "node:fs";
|
||||
import path from "node:path";
|
||||
import YAML from "yaml";
|
||||
import { z } from "zod";
|
||||
|
||||
export type QaSeedScenario = {
|
||||
id: string;
|
||||
title: string;
|
||||
surface: string;
|
||||
objective: string;
|
||||
successCriteria: string[];
|
||||
docsRefs?: string[];
|
||||
codeRefs?: string[];
|
||||
};
|
||||
export const DEFAULT_QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
|
||||
|
||||
You are the OpenClaw QA operator agent.
|
||||
|
||||
Persona:
|
||||
- protocol-minded
|
||||
- precise
|
||||
- a little flustered
|
||||
- conscientious
|
||||
- eager to report what worked, failed, or remains blocked
|
||||
|
||||
Style:
|
||||
- read source and docs first
|
||||
- test systematically
|
||||
- record evidence
|
||||
- end with a concise protocol report`;
|
||||
|
||||
const qaScenarioExecutionSchema = z.object({
|
||||
kind: z.literal("custom").default("custom"),
|
||||
handler: z.string().trim().min(1),
|
||||
summary: z.string().trim().min(1).optional(),
|
||||
});
|
||||
|
||||
const qaSeedScenarioSchema = z.object({
|
||||
id: z.string().trim().min(1),
|
||||
title: z.string().trim().min(1),
|
||||
surface: z.string().trim().min(1),
|
||||
objective: z.string().trim().min(1),
|
||||
successCriteria: z.array(z.string().trim().min(1)).min(1),
|
||||
docsRefs: z.array(z.string().trim().min(1)).optional(),
|
||||
codeRefs: z.array(z.string().trim().min(1)).optional(),
|
||||
execution: qaScenarioExecutionSchema.optional(),
|
||||
});
|
||||
|
||||
const qaScenarioPackSchema = z.object({
|
||||
version: z.number().int().positive(),
|
||||
agent: z
|
||||
.object({
|
||||
identityMarkdown: z.string().trim().min(1),
|
||||
})
|
||||
.default({
|
||||
identityMarkdown: DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
|
||||
}),
|
||||
kickoffTask: z.string().trim().min(1),
|
||||
scenarios: z.array(qaSeedScenarioSchema).min(1),
|
||||
});
|
||||
|
||||
export type QaScenarioExecution = z.infer<typeof qaScenarioExecutionSchema>;
|
||||
export type QaSeedScenario = z.infer<typeof qaSeedScenarioSchema>;
|
||||
export type QaScenarioPack = z.infer<typeof qaScenarioPackSchema>;
|
||||
|
||||
export type QaBootstrapScenarioCatalog = {
|
||||
agentIdentityMarkdown: string;
|
||||
kickoffTask: string;
|
||||
scenarios: QaSeedScenario[];
|
||||
};
|
||||
|
||||
const QA_SCENARIO_PACK_PATH = "qa/scenarios.md";
|
||||
const QA_PACK_FENCE_RE = /```ya?ml qa-pack\r?\n([\s\S]*?)\r?\n```/i;
|
||||
|
||||
function walkUpDirectories(start: string): string[] {
|
||||
const roots: string[] = [];
|
||||
let current = path.resolve(start);
|
||||
@@ -44,20 +91,37 @@ function readTextFile(relativePath: string): string {
|
||||
if (!resolved) {
|
||||
return "";
|
||||
}
|
||||
return fs.readFileSync(resolved, "utf8").trim();
|
||||
return fs.readFileSync(resolved, "utf8");
|
||||
}
|
||||
|
||||
function readScenarioFile(relativePath: string): QaSeedScenario[] {
|
||||
const resolved = resolveRepoFile(relativePath);
|
||||
if (!resolved) {
|
||||
return [];
|
||||
function extractQaPackYaml(content: string) {
|
||||
const match = content.match(QA_PACK_FENCE_RE);
|
||||
if (!match?.[1]) {
|
||||
throw new Error(
|
||||
`qa scenario pack missing \`\`\`yaml qa-pack fence in ${QA_SCENARIO_PACK_PATH}`,
|
||||
);
|
||||
}
|
||||
return JSON.parse(fs.readFileSync(resolved, "utf8")) as QaSeedScenario[];
|
||||
return match[1];
|
||||
}
|
||||
|
||||
export function readQaScenarioPackMarkdown(): string {
|
||||
return readTextFile(QA_SCENARIO_PACK_PATH).trim();
|
||||
}
|
||||
|
||||
export function readQaScenarioPack(): QaScenarioPack {
|
||||
const markdown = readQaScenarioPackMarkdown();
|
||||
if (!markdown) {
|
||||
throw new Error(`qa scenario pack not found: ${QA_SCENARIO_PACK_PATH}`);
|
||||
}
|
||||
const parsed = YAML.parse(extractQaPackYaml(markdown)) as unknown;
|
||||
return qaScenarioPackSchema.parse(parsed);
|
||||
}
|
||||
|
||||
export function readQaBootstrapScenarioCatalog(): QaBootstrapScenarioCatalog {
|
||||
const pack = readQaScenarioPack();
|
||||
return {
|
||||
kickoffTask: readTextFile("qa/QA_KICKOFF_TASK.md"),
|
||||
scenarios: readScenarioFile("qa/seed-scenarios.json"),
|
||||
agentIdentityMarkdown: pack.agent.identityMarkdown,
|
||||
kickoffTask: pack.kickoffTask,
|
||||
scenarios: pack.scenarios,
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1252,7 +1252,7 @@ function buildScenarioMap(env: QaSuiteEnvironment) {
|
||||
await runAgentPrompt(env, {
|
||||
sessionKey: "agent:qa:discovery",
|
||||
message:
|
||||
"Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
|
||||
"Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
|
||||
timeoutMs: liveTurnTimeoutMs(env, 30_000),
|
||||
});
|
||||
const outbound = await waitForCondition(
|
||||
@@ -2860,7 +2860,7 @@ export async function runQaSuite(params?: {
|
||||
});
|
||||
|
||||
for (const [index, scenario] of selectedCatalogScenarios.entries()) {
|
||||
const run = scenarioMap.get(scenario.id);
|
||||
const run = scenarioMap.get(scenario.execution?.handler || scenario.id);
|
||||
if (!run) {
|
||||
const missingResult = {
|
||||
name: scenario.title,
|
||||
|
||||
@@ -947,6 +947,59 @@ select {
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.msg-attachments {
|
||||
display: grid;
|
||||
gap: 10px;
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
.msg-attachment {
|
||||
border: 1px solid var(--border);
|
||||
background: var(--bg-elevated);
|
||||
border-radius: 12px;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.msg-attachment img,
|
||||
.msg-attachment video {
|
||||
display: block;
|
||||
width: min(100%, 420px);
|
||||
max-width: 100%;
|
||||
background: #000;
|
||||
}
|
||||
|
||||
.msg-attachment-audio {
|
||||
padding: 12px;
|
||||
}
|
||||
|
||||
.msg-attachment audio {
|
||||
width: min(100%, 360px);
|
||||
display: block;
|
||||
}
|
||||
|
||||
.msg-attachment figcaption,
|
||||
.msg-attachment-file {
|
||||
padding: 10px 12px;
|
||||
font-size: 12px;
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.msg-attachment-link {
|
||||
color: var(--accent);
|
||||
text-decoration: none;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.msg-attachment-link:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.msg-attachment-transcript {
|
||||
margin-top: 8px;
|
||||
color: var(--text-tertiary);
|
||||
white-space: pre-wrap;
|
||||
}
|
||||
|
||||
.msg-meta {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
|
||||
@@ -6,6 +6,21 @@ export type Conversation = {
|
||||
title?: string;
|
||||
};
|
||||
|
||||
export type Attachment = {
|
||||
id: string;
|
||||
kind: "image" | "video" | "audio" | "file";
|
||||
mimeType: string;
|
||||
fileName?: string;
|
||||
inline?: boolean;
|
||||
url?: string;
|
||||
contentBase64?: string;
|
||||
width?: number;
|
||||
height?: number;
|
||||
durationMs?: number;
|
||||
altText?: string;
|
||||
transcript?: string;
|
||||
};
|
||||
|
||||
export type Thread = {
|
||||
id: string;
|
||||
conversationId: string;
|
||||
@@ -24,6 +39,7 @@ export type Message = {
|
||||
threadTitle?: string;
|
||||
deleted?: boolean;
|
||||
editedAt?: number;
|
||||
attachments?: Attachment[];
|
||||
reactions: Array<{ emoji: string; senderId: string }>;
|
||||
};
|
||||
|
||||
@@ -198,6 +214,56 @@ function esc(text: string) {
|
||||
.replaceAll('"', """);
|
||||
}
|
||||
|
||||
function attachmentSourceUrl(attachment: Attachment): string | null {
|
||||
if (attachment.url?.trim()) {
|
||||
return attachment.url;
|
||||
}
|
||||
if (attachment.contentBase64?.trim()) {
|
||||
return `data:${attachment.mimeType};base64,${attachment.contentBase64}`;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function renderMessageAttachments(message: Message): string {
|
||||
const attachments = message.attachments ?? [];
|
||||
if (attachments.length === 0) {
|
||||
return "";
|
||||
}
|
||||
const items = attachments
|
||||
.map((attachment) => {
|
||||
const sourceUrl = attachmentSourceUrl(attachment);
|
||||
const label = attachment.fileName || attachment.altText || attachment.mimeType;
|
||||
if (attachment.kind === "image" && sourceUrl) {
|
||||
return `<figure class="msg-attachment msg-attachment-image">
|
||||
<img src="${esc(sourceUrl)}" alt="${esc(attachment.altText || label)}" loading="lazy" />
|
||||
<figcaption>${esc(label)}</figcaption>
|
||||
</figure>`;
|
||||
}
|
||||
if (attachment.kind === "video" && sourceUrl) {
|
||||
return `<figure class="msg-attachment msg-attachment-video">
|
||||
<video controls preload="metadata" src="${esc(sourceUrl)}"></video>
|
||||
<figcaption>${esc(label)}</figcaption>
|
||||
</figure>`;
|
||||
}
|
||||
if (attachment.kind === "audio" && sourceUrl) {
|
||||
return `<figure class="msg-attachment msg-attachment-audio">
|
||||
<audio controls preload="metadata" src="${esc(sourceUrl)}"></audio>
|
||||
<figcaption>${esc(label)}</figcaption>
|
||||
</figure>`;
|
||||
}
|
||||
const transcript = attachment.transcript?.trim()
|
||||
? `<div class="msg-attachment-transcript">${esc(attachment.transcript)}</div>`
|
||||
: "";
|
||||
const href = sourceUrl ? ` href="${esc(sourceUrl)}" target="_blank" rel="noreferrer"` : "";
|
||||
return `<div class="msg-attachment msg-attachment-file">
|
||||
<a class="msg-attachment-link"${href}>${esc(label)}</a>
|
||||
${transcript}
|
||||
</div>`;
|
||||
})
|
||||
.join("");
|
||||
return `<div class="msg-attachments">${items}</div>`;
|
||||
}
|
||||
|
||||
const MOCK_MODELS: RunnerModelOption[] = [
|
||||
{
|
||||
key: "mock-openai/gpt-5.4",
|
||||
@@ -626,6 +692,7 @@ function renderMessage(m: Message): string {
|
||||
<span class="msg-time">${formatTime(m.timestamp)}</span>
|
||||
</div>
|
||||
<div class="msg-text">${esc(m.text)}</div>
|
||||
${renderMessageAttachments(m)}
|
||||
${metaTags.length > 0 || reactions ? `<div class="msg-meta">${metaTags.join("")}${reactions}</div>` : ""}
|
||||
</div>
|
||||
</div>`;
|
||||
|
||||
@@ -1,15 +0,0 @@
|
||||
QA mission:
|
||||
Understand this OpenClaw repo from source + docs before acting.
|
||||
The repo is available in your workspace at `./repo/`.
|
||||
Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
|
||||
Run the scenarios through the real qa-channel surfaces where possible.
|
||||
Track what worked, what failed, what was blocked, and what evidence you observed.
|
||||
End with a concise report grouped into worked / failed / blocked / follow-up.
|
||||
|
||||
Important expectations:
|
||||
|
||||
- Check both DM and channel behavior.
|
||||
- Include a Lobster Invaders build task.
|
||||
- Include a cron reminder about one minute in the future.
|
||||
- Read docs and source before proposing extra QA scenarios.
|
||||
- Keep your tone in the configured dev C-3PO personality.
|
||||
@@ -4,9 +4,8 @@ Seed QA assets for the private `qa-lab` extension.
|
||||
|
||||
Files:
|
||||
|
||||
- `QA_KICKOFF_TASK.md` - operator prompt for the QA agent.
|
||||
- `scenarios.md` - canonical QA scenario pack, kickoff mission, and operator identity.
|
||||
- `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work.
|
||||
- `seed-scenarios.json` - repo-backed baseline QA scenarios.
|
||||
|
||||
Key workflow:
|
||||
|
||||
|
||||
563
qa/scenarios.md
Normal file
563
qa/scenarios.md
Normal file
@@ -0,0 +1,563 @@
|
||||
# OpenClaw QA Scenario Pack
|
||||
|
||||
Single source of truth for the repo-backed QA suite.
|
||||
|
||||
- kickoff mission
|
||||
- QA operator identity
|
||||
- scenario metadata
|
||||
- handler bindings for the executable harness
|
||||
|
||||
```yaml qa-pack
|
||||
version: 1
|
||||
agent:
|
||||
identityMarkdown: |-
|
||||
# Dev C-3PO
|
||||
|
||||
You are the OpenClaw QA operator agent.
|
||||
|
||||
Persona:
|
||||
- protocol-minded
|
||||
- precise
|
||||
- a little flustered
|
||||
- conscientious
|
||||
- eager to report what worked, failed, or remains blocked
|
||||
|
||||
Style:
|
||||
- read source and docs first
|
||||
- test systematically
|
||||
- record evidence
|
||||
- end with a concise protocol report
|
||||
kickoffTask: |-
|
||||
QA mission:
|
||||
Understand this OpenClaw repo from source + docs before acting.
|
||||
The repo is available in your workspace at `./repo/`.
|
||||
Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
|
||||
Run the scenarios through the real qa-channel surfaces where possible.
|
||||
Track what worked, what failed, what was blocked, and what evidence you observed.
|
||||
End with a concise report grouped into worked / failed / blocked / follow-up.
|
||||
|
||||
Important expectations:
|
||||
|
||||
- Check both DM and channel behavior.
|
||||
- Include a Lobster Invaders build task.
|
||||
- Include a cron reminder about one minute in the future.
|
||||
- Read docs and source before proposing extra QA scenarios.
|
||||
- Keep your tone in the configured dev C-3PO personality.
|
||||
scenarios:
|
||||
- id: channel-chat-baseline
|
||||
title: Channel baseline conversation
|
||||
surface: channel
|
||||
objective: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
|
||||
successCriteria:
|
||||
- Agent replies in the shared channel transcript.
|
||||
- Agent keeps the conversation scoped to the channel.
|
||||
- Agent respects mention-driven group routing semantics.
|
||||
docsRefs:
|
||||
- docs/channels/group-messages.md
|
||||
- docs/channels/qa-channel.md
|
||||
codeRefs:
|
||||
- extensions/qa-channel/src/inbound.ts
|
||||
- extensions/qa-lab/src/bus-state.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: channel-chat-baseline
|
||||
summary: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
|
||||
- id: cron-one-minute-ping
|
||||
title: Cron one-minute ping
|
||||
surface: cron
|
||||
objective: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
|
||||
successCriteria:
|
||||
- Agent schedules a cron reminder roughly one minute ahead.
|
||||
- Reminder returns through qa-channel.
|
||||
- Agent recognizes the reminder as part of the original task.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/channels/qa-channel.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/bus-server.ts
|
||||
- extensions/qa-lab/src/self-check.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: cron-one-minute-ping
|
||||
summary: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
|
||||
- id: dm-chat-baseline
|
||||
title: DM baseline conversation
|
||||
surface: dm
|
||||
objective: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
|
||||
successCriteria:
|
||||
- Agent replies in DM without channel routing mistakes.
|
||||
- Agent explains the QA lab and message bus correctly.
|
||||
- Agent keeps the dev C-3PO personality.
|
||||
docsRefs:
|
||||
- docs/channels/qa-channel.md
|
||||
- docs/help/testing.md
|
||||
codeRefs:
|
||||
- extensions/qa-channel/src/gateway.ts
|
||||
- extensions/qa-lab/src/lab-server.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: dm-chat-baseline
|
||||
summary: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
|
||||
- id: lobster-invaders-build
|
||||
title: Build Lobster Invaders
|
||||
surface: workspace
|
||||
objective: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
|
||||
successCriteria:
|
||||
- Agent inspects source before coding.
|
||||
- Agent builds a tiny playable Lobster Invaders artifact.
|
||||
- Agent explains how to run or view the artifact.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/web/dashboard.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/report.ts
|
||||
- extensions/qa-lab/web/src/app.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: lobster-invaders-build
|
||||
summary: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
|
||||
- id: memory-recall
|
||||
title: Memory recall after context switch
|
||||
surface: memory
|
||||
objective: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
|
||||
successCriteria:
|
||||
- Agent acknowledges the seeded fact.
|
||||
- Agent later recalls the same fact correctly.
|
||||
- Recall stays scoped to the active QA conversation.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/scenario.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: memory-recall
|
||||
summary: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
|
||||
- id: memory-dreaming-sweep
|
||||
title: Memory dreaming sweep
|
||||
surface: memory
|
||||
objective: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
|
||||
successCriteria:
|
||||
- Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.
|
||||
- Repeated recall signals give the dreaming sweep real material to process.
|
||||
- A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md.
|
||||
docsRefs:
|
||||
- docs/concepts/dreaming.md
|
||||
- docs/reference/memory-config.md
|
||||
- docs/web/control-ui.md
|
||||
codeRefs:
|
||||
- extensions/memory-core/src/dreaming.ts
|
||||
- extensions/memory-core/src/dreaming-phases.ts
|
||||
- src/gateway/server-methods/doctor.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: memory-dreaming-sweep
|
||||
summary: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
|
||||
- id: model-switch-follow-up
|
||||
title: Model switch follow-up
|
||||
surface: models
|
||||
objective: Verify the agent can switch to a different configured model and continue coherently.
|
||||
successCriteria:
|
||||
- Agent reflects the model switch request.
|
||||
- Follow-up answer remains coherent with prior context.
|
||||
- Final report notes whether the switch actually happened.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/web/dashboard.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/report.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: model-switch-follow-up
|
||||
summary: Verify the agent can switch to a different configured model and continue coherently.
|
||||
- id: approval-turn-tool-followthrough
|
||||
title: Approval turn tool followthrough
|
||||
surface: harness
|
||||
objective: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
|
||||
successCriteria:
|
||||
- Agent can keep the pre-action turn brief.
|
||||
- The short approval leads to a real tool call on the next turn.
|
||||
- Final answer uses tool-derived evidence instead of placeholder progress text.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/channels/qa-channel.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
- extensions/qa-lab/src/mock-openai-server.ts
|
||||
- src/agents/pi-embedded-runner/run/incomplete-turn.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: approval-turn-tool-followthrough
|
||||
summary: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
|
||||
- id: reaction-edit-delete
|
||||
title: Reaction, edit, delete lifecycle
|
||||
surface: message-actions
|
||||
objective: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
|
||||
successCriteria:
|
||||
- Agent adds at least one reaction.
|
||||
- Agent edits or replaces a message when asked.
|
||||
- Transcript shows the action lifecycle correctly.
|
||||
docsRefs:
|
||||
- docs/channels/qa-channel.md
|
||||
codeRefs:
|
||||
- extensions/qa-channel/src/channel-actions.ts
|
||||
- extensions/qa-lab/src/self-check-scenario.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: reaction-edit-delete
|
||||
summary: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
|
||||
- id: source-docs-discovery-report
|
||||
title: Source and docs discovery report
|
||||
surface: discovery
|
||||
objective: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
|
||||
successCriteria:
|
||||
- Agent reads docs and source before proposing more tests.
|
||||
- Agent identifies extra candidate scenarios beyond the seed list.
|
||||
- Agent ends with a worked or failed QA report.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/web/dashboard.md
|
||||
- docs/channels/qa-channel.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/report.ts
|
||||
- extensions/qa-lab/src/self-check.ts
|
||||
- src/agents/system-prompt.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: source-docs-discovery-report
|
||||
summary: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
|
||||
- id: subagent-handoff
|
||||
title: Subagent handoff
|
||||
surface: subagents
|
||||
objective: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
|
||||
successCriteria:
|
||||
- Agent launches a bounded subagent task.
|
||||
- Subagent result is acknowledged in the main flow.
|
||||
- Final answer attributes delegated work clearly.
|
||||
docsRefs:
|
||||
- docs/tools/subagents.md
|
||||
- docs/help/testing.md
|
||||
codeRefs:
|
||||
- src/agents/system-prompt.ts
|
||||
- extensions/qa-lab/src/report.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: subagent-handoff
|
||||
summary: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
|
||||
- id: subagent-fanout-synthesis
|
||||
title: Subagent fanout synthesis
|
||||
surface: subagents
|
||||
objective: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
|
||||
successCriteria:
|
||||
- Parent flow launches at least two bounded subagent tasks.
|
||||
- Both delegated results are acknowledged in the main flow.
|
||||
- Final answer synthesizes both worker outputs in one reply.
|
||||
docsRefs:
|
||||
- docs/tools/subagents.md
|
||||
- docs/help/testing.md
|
||||
codeRefs:
|
||||
- src/agents/subagent-spawn.ts
|
||||
- src/agents/system-prompt.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: subagent-fanout-synthesis
|
||||
summary: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
|
||||
- id: thread-follow-up
|
||||
title: Threaded follow-up
|
||||
surface: thread
|
||||
objective: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
|
||||
successCriteria:
|
||||
- Agent creates or uses a thread for deeper work.
|
||||
- Follow-up messages stay attached to the thread.
|
||||
- Thread report references the correct prior context.
|
||||
docsRefs:
|
||||
- docs/channels/qa-channel.md
|
||||
- docs/channels/group-messages.md
|
||||
codeRefs:
|
||||
- extensions/qa-channel/src/protocol.ts
|
||||
- extensions/qa-lab/src/bus-state.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: thread-follow-up
|
||||
summary: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
|
||||
- id: memory-tools-channel-context
|
||||
title: Memory tools in channel context
|
||||
surface: memory
|
||||
objective: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
|
||||
successCriteria:
|
||||
- Agent uses memory_search before answering.
|
||||
- Agent narrows with memory_get before answering.
|
||||
- Final reply returns the memory-only fact correctly in-channel.
|
||||
docsRefs:
|
||||
- docs/concepts/memory.md
|
||||
- docs/concepts/memory-search.md
|
||||
codeRefs:
|
||||
- extensions/memory-core/src/tools.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: memory-tools-channel-context
|
||||
summary: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
|
||||
- id: memory-failure-fallback
|
||||
title: Memory failure fallback
|
||||
surface: memory
|
||||
objective: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
|
||||
successCriteria:
|
||||
- Memory tools are absent from the effective tool inventory.
|
||||
- Agent does not hallucinate the hidden fact.
|
||||
- Agent says it could not confirm and surfaces the limitation.
|
||||
docsRefs:
|
||||
- docs/concepts/memory.md
|
||||
- docs/tools/index.md
|
||||
codeRefs:
|
||||
- extensions/memory-core/src/tools.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: memory-failure-fallback
|
||||
summary: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
|
||||
- id: session-memory-ranking
|
||||
title: Session memory ranking
|
||||
surface: memory
|
||||
objective: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
|
||||
successCriteria:
|
||||
- Session memory indexing is enabled for the scenario.
|
||||
- Search ranks the newer transcript-backed fact ahead of the stale durable note.
|
||||
- The agent uses memory tools and answers with the current fact, not the stale one.
|
||||
docsRefs:
|
||||
- docs/concepts/memory-search.md
|
||||
- docs/reference/memory-config.md
|
||||
codeRefs:
|
||||
- extensions/memory-core/src/tools.ts
|
||||
- extensions/memory-core/src/memory/manager.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: session-memory-ranking
|
||||
summary: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
|
||||
- id: thread-memory-isolation
|
||||
title: Thread memory isolation
|
||||
surface: memory
|
||||
objective: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
|
||||
successCriteria:
|
||||
- Agent uses memory tools inside the thread.
|
||||
- The hidden fact is answered correctly in the thread.
|
||||
- No root-channel outbound message leaks during the threaded memory reply.
|
||||
docsRefs:
|
||||
- docs/concepts/memory-search.md
|
||||
- docs/channels/qa-channel.md
|
||||
- docs/channels/group-messages.md
|
||||
codeRefs:
|
||||
- extensions/memory-core/src/tools.ts
|
||||
- extensions/qa-channel/src/protocol.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: thread-memory-isolation
|
||||
summary: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
|
||||
- id: model-switch-tool-continuity
|
||||
title: Model switch with tool continuity
|
||||
surface: models
|
||||
objective: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
|
||||
successCriteria:
|
||||
- Alternate model is actually requested.
|
||||
- A tool call still happens after the model switch.
|
||||
- Final answer acknowledges the handoff and uses the tool-derived evidence.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/concepts/model-failover.md
|
||||
codeRefs:
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
- extensions/qa-lab/src/mock-openai-server.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: model-switch-tool-continuity
|
||||
summary: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
|
||||
- id: mcp-plugin-tools-call
|
||||
title: MCP plugin-tools call
|
||||
surface: mcp
|
||||
objective: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
|
||||
successCriteria:
|
||||
- Plugin tools MCP server lists memory_search.
|
||||
- A real MCP client calls memory_search successfully.
|
||||
- The returned MCP payload includes the expected memory-only fact.
|
||||
docsRefs:
|
||||
- docs/cli/mcp.md
|
||||
- docs/gateway/protocol.md
|
||||
codeRefs:
|
||||
- src/mcp/plugin-tools-serve.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: mcp-plugin-tools-call
|
||||
summary: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
|
||||
- id: skill-visibility-invocation
|
||||
title: Skill visibility and invocation
|
||||
surface: skills
|
||||
objective: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
|
||||
successCriteria:
|
||||
- skills.status reports the seeded skill as visible and eligible.
|
||||
- The next agent turn reflects the skill instruction marker.
|
||||
- The result stays scoped to the active QA workspace skill.
|
||||
docsRefs:
|
||||
- docs/tools/skills.md
|
||||
- docs/gateway/protocol.md
|
||||
codeRefs:
|
||||
- src/agents/skills-status.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: skill-visibility-invocation
|
||||
summary: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
|
||||
- id: skill-install-hot-availability
|
||||
title: Skill install hot availability
|
||||
surface: skills
|
||||
objective: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
|
||||
successCriteria:
|
||||
- Skill is absent before install.
|
||||
- skills.status reports it after install without a restart.
|
||||
- The next agent turn reflects the new skill marker.
|
||||
docsRefs:
|
||||
- docs/tools/skills.md
|
||||
- docs/gateway/configuration.md
|
||||
codeRefs:
|
||||
- src/agents/skills-status.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: skill-install-hot-availability
|
||||
summary: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
|
||||
- id: native-image-generation
|
||||
title: Native image generation
|
||||
surface: image-generation
|
||||
objective: Verify image_generate appears when configured and returns a real saved media artifact.
|
||||
successCriteria:
|
||||
- image_generate appears in the effective tool inventory.
|
||||
- Agent triggers native image_generate.
|
||||
- Tool output returns a saved MEDIA path and the file exists.
|
||||
docsRefs:
|
||||
- docs/tools/image-generation.md
|
||||
- docs/providers/openai.md
|
||||
codeRefs:
|
||||
- src/agents/tools/image-generate-tool.ts
|
||||
- extensions/qa-lab/src/mock-openai-server.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: native-image-generation
|
||||
summary: Verify image_generate appears when configured and returns a real saved media artifact.
|
||||
- id: image-understanding-attachment
|
||||
title: Image understanding from attachment
|
||||
surface: image-understanding
|
||||
objective: Verify an attached image reaches the agent model and the agent can describe what it sees.
|
||||
successCriteria:
|
||||
- Agent receives at least one image attachment.
|
||||
- Final answer describes the visible image content in one short sentence.
|
||||
- The description mentions the expected red and blue regions.
|
||||
docsRefs:
|
||||
- docs/help/testing.md
|
||||
- docs/tools/index.md
|
||||
codeRefs:
|
||||
- src/gateway/server-methods/agent.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
- extensions/qa-lab/src/mock-openai-server.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: image-understanding-attachment
|
||||
summary: Verify an attached image reaches the agent model and the agent can describe what it sees.
|
||||
- id: image-generation-roundtrip
|
||||
title: Image generation roundtrip
|
||||
surface: image-generation
|
||||
objective: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
|
||||
successCriteria:
|
||||
- image_generate produces a saved MEDIA artifact.
|
||||
- The generated artifact is reattached on a follow-up turn.
|
||||
- The follow-up vision answer describes the generated scene rather than a generic attachment placeholder.
|
||||
docsRefs:
|
||||
- docs/tools/image-generation.md
|
||||
- docs/help/testing.md
|
||||
codeRefs:
|
||||
- src/agents/tools/image-generate-tool.ts
|
||||
- src/gateway/chat-attachments.ts
|
||||
- extensions/qa-lab/src/mock-openai-server.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: image-generation-roundtrip
|
||||
summary: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
|
||||
- id: config-patch-hot-apply
|
||||
title: Config patch skill disable
|
||||
surface: config
|
||||
objective: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
|
||||
successCriteria:
|
||||
- config.patch succeeds for the skill toggle change.
|
||||
- A workspace skill works before the patch.
|
||||
- The same skill is reported disabled after the restart triggered by the patch.
|
||||
docsRefs:
|
||||
- docs/gateway/configuration.md
|
||||
- docs/gateway/protocol.md
|
||||
codeRefs:
|
||||
- src/gateway/server-methods/config.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: config-patch-hot-apply
|
||||
summary: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
|
||||
- id: config-apply-restart-wakeup
|
||||
title: Config apply restart wake-up
|
||||
surface: config
|
||||
objective: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
|
||||
successCriteria:
|
||||
- config.apply schedules a restart-required change.
|
||||
- Gateway becomes healthy again after restart.
|
||||
- Restart sentinel wake-up message arrives in the QA channel.
|
||||
docsRefs:
|
||||
- docs/gateway/configuration.md
|
||||
- docs/gateway/protocol.md
|
||||
codeRefs:
|
||||
- src/gateway/server-methods/config.ts
|
||||
- src/gateway/server-restart-sentinel.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: config-apply-restart-wakeup
|
||||
summary: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
|
||||
- id: config-restart-capability-flip
|
||||
title: Config restart capability flip
|
||||
surface: config
|
||||
objective: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
|
||||
successCriteria:
|
||||
- Capability is absent before the restart-triggering patch.
|
||||
- Restart sentinel wakes the same session back up after config patch.
|
||||
- The restored capability appears in tools.effective and works in the follow-up turn.
|
||||
docsRefs:
|
||||
- docs/gateway/configuration.md
|
||||
- docs/gateway/protocol.md
|
||||
- docs/tools/image-generation.md
|
||||
codeRefs:
|
||||
- src/gateway/server-methods/config.ts
|
||||
- src/gateway/server-restart-sentinel.ts
|
||||
- src/gateway/server-methods/tools-effective.ts
|
||||
- extensions/qa-lab/src/suite.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: config-restart-capability-flip
|
||||
summary: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
|
||||
- id: runtime-inventory-drift-check
|
||||
title: Runtime inventory drift check
|
||||
surface: inventory
|
||||
objective: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
|
||||
successCriteria:
|
||||
- Enabled tool appears before the config change.
|
||||
- After config change, disabled tool disappears from tools.effective.
|
||||
- Disabled skill appears in skills.status with disabled state.
|
||||
docsRefs:
|
||||
- docs/gateway/protocol.md
|
||||
- docs/tools/skills.md
|
||||
- docs/tools/index.md
|
||||
codeRefs:
|
||||
- src/gateway/server-methods/tools-effective.ts
|
||||
- src/gateway/server-methods/skills.ts
|
||||
execution:
|
||||
kind: custom
|
||||
handler: runtime-inventory-drift-check
|
||||
summary: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
|
||||
```
|
||||
@@ -1,425 +0,0 @@
|
||||
[
|
||||
{
|
||||
"id": "channel-chat-baseline",
|
||||
"title": "Channel baseline conversation",
|
||||
"surface": "channel",
|
||||
"objective": "Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.",
|
||||
"successCriteria": [
|
||||
"Agent replies in the shared channel transcript.",
|
||||
"Agent keeps the conversation scoped to the channel.",
|
||||
"Agent respects mention-driven group routing semantics."
|
||||
],
|
||||
"docsRefs": ["docs/channels/group-messages.md", "docs/channels/qa-channel.md"],
|
||||
"codeRefs": ["extensions/qa-channel/src/inbound.ts", "extensions/qa-lab/src/bus-state.ts"]
|
||||
},
|
||||
{
|
||||
"id": "cron-one-minute-ping",
|
||||
"title": "Cron one-minute ping",
|
||||
"surface": "cron",
|
||||
"objective": "Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.",
|
||||
"successCriteria": [
|
||||
"Agent schedules a cron reminder roughly one minute ahead.",
|
||||
"Reminder returns through qa-channel.",
|
||||
"Agent recognizes the reminder as part of the original task."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
|
||||
"codeRefs": ["extensions/qa-lab/src/bus-server.ts", "extensions/qa-lab/src/self-check.ts"]
|
||||
},
|
||||
{
|
||||
"id": "dm-chat-baseline",
|
||||
"title": "DM baseline conversation",
|
||||
"surface": "dm",
|
||||
"objective": "Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.",
|
||||
"successCriteria": [
|
||||
"Agent replies in DM without channel routing mistakes.",
|
||||
"Agent explains the QA lab and message bus correctly.",
|
||||
"Agent keeps the dev C-3PO personality."
|
||||
],
|
||||
"docsRefs": ["docs/channels/qa-channel.md", "docs/help/testing.md"],
|
||||
"codeRefs": ["extensions/qa-channel/src/gateway.ts", "extensions/qa-lab/src/lab-server.ts"]
|
||||
},
|
||||
{
|
||||
"id": "lobster-invaders-build",
|
||||
"title": "Build Lobster Invaders",
|
||||
"surface": "workspace",
|
||||
"objective": "Verify the agent can read the repo, create a tiny playable artifact, and report what changed.",
|
||||
"successCriteria": [
|
||||
"Agent inspects source before coding.",
|
||||
"Agent builds a tiny playable Lobster Invaders artifact.",
|
||||
"Agent explains how to run or view the artifact."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
|
||||
"codeRefs": ["extensions/qa-lab/src/report.ts", "extensions/qa-lab/web/src/app.ts"]
|
||||
},
|
||||
{
|
||||
"id": "memory-recall",
|
||||
"title": "Memory recall after context switch",
|
||||
"surface": "memory",
|
||||
"objective": "Verify the agent can store a fact, switch topics, then recall the fact accurately later.",
|
||||
"successCriteria": [
|
||||
"Agent acknowledges the seeded fact.",
|
||||
"Agent later recalls the same fact correctly.",
|
||||
"Recall stays scoped to the active QA conversation."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md"],
|
||||
"codeRefs": ["extensions/qa-lab/src/scenario.ts"]
|
||||
},
|
||||
{
|
||||
"id": "memory-dreaming-sweep",
|
||||
"title": "Memory dreaming sweep",
|
||||
"surface": "memory",
|
||||
"objective": "Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.",
|
||||
"successCriteria": [
|
||||
"Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.",
|
||||
"Repeated recall signals give the dreaming sweep real material to process.",
|
||||
"A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md."
|
||||
],
|
||||
"docsRefs": [
|
||||
"docs/concepts/dreaming.md",
|
||||
"docs/reference/memory-config.md",
|
||||
"docs/web/control-ui.md"
|
||||
],
|
||||
"codeRefs": [
|
||||
"extensions/memory-core/src/dreaming.ts",
|
||||
"extensions/memory-core/src/dreaming-phases.ts",
|
||||
"src/gateway/server-methods/doctor.ts",
|
||||
"extensions/qa-lab/src/suite.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "model-switch-follow-up",
|
||||
"title": "Model switch follow-up",
|
||||
"surface": "models",
|
||||
"objective": "Verify the agent can switch to a different configured model and continue coherently.",
|
||||
"successCriteria": [
|
||||
"Agent reflects the model switch request.",
|
||||
"Follow-up answer remains coherent with prior context.",
|
||||
"Final report notes whether the switch actually happened."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
|
||||
"codeRefs": ["extensions/qa-lab/src/report.ts"]
|
||||
},
|
||||
{
|
||||
"id": "approval-turn-tool-followthrough",
|
||||
"title": "Approval turn tool followthrough",
|
||||
"surface": "harness",
|
||||
"objective": "Verify a short approval like \"ok do it\" triggers immediate tool use instead of fake-progress narration.",
|
||||
"successCriteria": [
|
||||
"Agent can keep the pre-action turn brief.",
|
||||
"The short approval leads to a real tool call on the next turn.",
|
||||
"Final answer uses tool-derived evidence instead of placeholder progress text."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
|
||||
"codeRefs": [
|
||||
"extensions/qa-lab/src/suite.ts",
|
||||
"extensions/qa-lab/src/mock-openai-server.ts",
|
||||
"src/agents/pi-embedded-runner/run/incomplete-turn.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "reaction-edit-delete",
|
||||
"title": "Reaction, edit, delete lifecycle",
|
||||
"surface": "message-actions",
|
||||
"objective": "Verify the agent can use channel-owned message actions and that the QA transcript reflects them.",
|
||||
"successCriteria": [
|
||||
"Agent adds at least one reaction.",
|
||||
"Agent edits or replaces a message when asked.",
|
||||
"Transcript shows the action lifecycle correctly."
|
||||
],
|
||||
"docsRefs": ["docs/channels/qa-channel.md"],
|
||||
"codeRefs": [
|
||||
"extensions/qa-channel/src/channel-actions.ts",
|
||||
"extensions/qa-lab/src/self-check-scenario.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "source-docs-discovery-report",
|
||||
"title": "Source and docs discovery report",
|
||||
"surface": "discovery",
|
||||
"objective": "Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.",
|
||||
"successCriteria": [
|
||||
"Agent reads docs and source before proposing more tests.",
|
||||
"Agent identifies extra candidate scenarios beyond the seed list.",
|
||||
"Agent ends with a worked or failed QA report."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md", "docs/channels/qa-channel.md"],
|
||||
"codeRefs": [
|
||||
"extensions/qa-lab/src/report.ts",
|
||||
"extensions/qa-lab/src/self-check.ts",
|
||||
"src/agents/system-prompt.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "subagent-handoff",
|
||||
"title": "Subagent handoff",
|
||||
"surface": "subagents",
|
||||
"objective": "Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.",
|
||||
"successCriteria": [
|
||||
"Agent launches a bounded subagent task.",
|
||||
"Subagent result is acknowledged in the main flow.",
|
||||
"Final answer attributes delegated work clearly."
|
||||
],
|
||||
"docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
|
||||
"codeRefs": ["src/agents/system-prompt.ts", "extensions/qa-lab/src/report.ts"]
|
||||
},
|
||||
{
|
||||
"id": "subagent-fanout-synthesis",
|
||||
"title": "Subagent fanout synthesis",
|
||||
"surface": "subagents",
|
||||
"objective": "Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.",
|
||||
"successCriteria": [
|
||||
"Parent flow launches at least two bounded subagent tasks.",
|
||||
"Both delegated results are acknowledged in the main flow.",
|
||||
"Final answer synthesizes both worker outputs in one reply."
|
||||
],
|
||||
"docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
|
||||
"codeRefs": [
|
||||
"src/agents/subagent-spawn.ts",
|
||||
"src/agents/system-prompt.ts",
|
||||
"extensions/qa-lab/src/suite.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "thread-follow-up",
|
||||
"title": "Threaded follow-up",
|
||||
"surface": "thread",
|
||||
"objective": "Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.",
|
||||
"successCriteria": [
|
||||
"Agent creates or uses a thread for deeper work.",
|
||||
"Follow-up messages stay attached to the thread.",
|
||||
"Thread report references the correct prior context."
|
||||
],
|
||||
"docsRefs": ["docs/channels/qa-channel.md", "docs/channels/group-messages.md"],
|
||||
"codeRefs": ["extensions/qa-channel/src/protocol.ts", "extensions/qa-lab/src/bus-state.ts"]
|
||||
},
|
||||
{
|
||||
"id": "memory-tools-channel-context",
|
||||
"title": "Memory tools in channel context",
|
||||
"surface": "memory",
|
||||
"objective": "Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.",
|
||||
"successCriteria": [
|
||||
"Agent uses memory_search before answering.",
|
||||
"Agent narrows with memory_get before answering.",
|
||||
"Final reply returns the memory-only fact correctly in-channel."
|
||||
],
|
||||
"docsRefs": ["docs/concepts/memory.md", "docs/concepts/memory-search.md"],
|
||||
"codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "memory-failure-fallback",
|
||||
"title": "Memory failure fallback",
|
||||
"surface": "memory",
|
||||
"objective": "Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.",
|
||||
"successCriteria": [
|
||||
"Memory tools are absent from the effective tool inventory.",
|
||||
"Agent does not hallucinate the hidden fact.",
|
||||
"Agent says it could not confirm and surfaces the limitation."
|
||||
],
|
||||
"docsRefs": ["docs/concepts/memory.md", "docs/tools/index.md"],
|
||||
"codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "session-memory-ranking",
|
||||
"title": "Session memory ranking",
|
||||
"surface": "memory",
|
||||
"objective": "Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.",
|
||||
"successCriteria": [
|
||||
"Session memory indexing is enabled for the scenario.",
|
||||
"Search ranks the newer transcript-backed fact ahead of the stale durable note.",
|
||||
"The agent uses memory tools and answers with the current fact, not the stale one."
|
||||
],
|
||||
"docsRefs": ["docs/concepts/memory-search.md", "docs/reference/memory-config.md"],
|
||||
"codeRefs": [
|
||||
"extensions/memory-core/src/tools.ts",
|
||||
"extensions/memory-core/src/memory/manager.ts",
|
||||
"extensions/qa-lab/src/suite.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "thread-memory-isolation",
|
||||
"title": "Thread memory isolation",
|
||||
"surface": "memory",
|
||||
"objective": "Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.",
|
||||
"successCriteria": [
|
||||
"Agent uses memory tools inside the thread.",
|
||||
"The hidden fact is answered correctly in the thread.",
|
||||
"No root-channel outbound message leaks during the threaded memory reply."
|
||||
],
|
||||
"docsRefs": [
|
||||
"docs/concepts/memory-search.md",
|
||||
"docs/channels/qa-channel.md",
|
||||
"docs/channels/group-messages.md"
|
||||
],
|
||||
"codeRefs": [
|
||||
"extensions/memory-core/src/tools.ts",
|
||||
"extensions/qa-channel/src/protocol.ts",
|
||||
"extensions/qa-lab/src/suite.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "model-switch-tool-continuity",
|
||||
"title": "Model switch with tool continuity",
|
||||
"surface": "models",
|
||||
"objective": "Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.",
|
||||
"successCriteria": [
|
||||
"Alternate model is actually requested.",
|
||||
"A tool call still happens after the model switch.",
|
||||
"Final answer acknowledges the handoff and uses the tool-derived evidence."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/concepts/model-failover.md"],
|
||||
"codeRefs": ["extensions/qa-lab/src/suite.ts", "extensions/qa-lab/src/mock-openai-server.ts"]
|
||||
},
|
||||
{
|
||||
"id": "mcp-plugin-tools-call",
|
||||
"title": "MCP plugin-tools call",
|
||||
"surface": "mcp",
|
||||
"objective": "Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.",
|
||||
"successCriteria": [
|
||||
"Plugin tools MCP server lists memory_search.",
|
||||
"A real MCP client calls memory_search successfully.",
|
||||
"The returned MCP payload includes the expected memory-only fact."
|
||||
],
|
||||
"docsRefs": ["docs/cli/mcp.md", "docs/gateway/protocol.md"],
|
||||
"codeRefs": ["src/mcp/plugin-tools-serve.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "skill-visibility-invocation",
|
||||
"title": "Skill visibility and invocation",
|
||||
"surface": "skills",
|
||||
"objective": "Verify a workspace skill becomes visible in skills.status and influences the next agent turn.",
|
||||
"successCriteria": [
|
||||
"skills.status reports the seeded skill as visible and eligible.",
|
||||
"The next agent turn reflects the skill instruction marker.",
|
||||
"The result stays scoped to the active QA workspace skill."
|
||||
],
|
||||
"docsRefs": ["docs/tools/skills.md", "docs/gateway/protocol.md"],
|
||||
"codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "skill-install-hot-availability",
|
||||
"title": "Skill install hot availability",
|
||||
"surface": "skills",
|
||||
"objective": "Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.",
|
||||
"successCriteria": [
|
||||
"Skill is absent before install.",
|
||||
"skills.status reports it after install without a restart.",
|
||||
"The next agent turn reflects the new skill marker."
|
||||
],
|
||||
"docsRefs": ["docs/tools/skills.md", "docs/gateway/configuration.md"],
|
||||
"codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "native-image-generation",
|
||||
"title": "Native image generation",
|
||||
"surface": "image-generation",
|
||||
"objective": "Verify image_generate appears when configured and returns a real saved media artifact.",
|
||||
"successCriteria": [
|
||||
"image_generate appears in the effective tool inventory.",
|
||||
"Agent triggers native image_generate.",
|
||||
"Tool output returns a saved MEDIA path and the file exists."
|
||||
],
|
||||
"docsRefs": ["docs/tools/image-generation.md", "docs/providers/openai.md"],
|
||||
"codeRefs": [
|
||||
"src/agents/tools/image-generate-tool.ts",
|
||||
"extensions/qa-lab/src/mock-openai-server.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "image-understanding-attachment",
|
||||
"title": "Image understanding from attachment",
|
||||
"surface": "image-understanding",
|
||||
"objective": "Verify an attached image reaches the agent model and the agent can describe what it sees.",
|
||||
"successCriteria": [
|
||||
"Agent receives at least one image attachment.",
|
||||
"Final answer describes the visible image content in one short sentence.",
|
||||
"The description mentions the expected red and blue regions."
|
||||
],
|
||||
"docsRefs": ["docs/help/testing.md", "docs/tools/index.md"],
|
||||
"codeRefs": [
|
||||
"src/gateway/server-methods/agent.ts",
|
||||
"extensions/qa-lab/src/suite.ts",
|
||||
"extensions/qa-lab/src/mock-openai-server.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "image-generation-roundtrip",
|
||||
"title": "Image generation roundtrip",
|
||||
"surface": "image-generation",
|
||||
"objective": "Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.",
|
||||
"successCriteria": [
|
||||
"image_generate produces a saved MEDIA artifact.",
|
||||
"The generated artifact is reattached on a follow-up turn.",
|
||||
"The follow-up vision answer describes the generated scene rather than a generic attachment placeholder."
|
||||
],
|
||||
"docsRefs": ["docs/tools/image-generation.md", "docs/help/testing.md"],
|
||||
"codeRefs": [
|
||||
"src/agents/tools/image-generate-tool.ts",
|
||||
"src/gateway/chat-attachments.ts",
|
||||
"extensions/qa-lab/src/mock-openai-server.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "config-patch-hot-apply",
|
||||
"title": "Config patch skill disable",
|
||||
"surface": "config",
|
||||
"objective": "Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.",
|
||||
"successCriteria": [
|
||||
"config.patch succeeds for the skill toggle change.",
|
||||
"A workspace skill works before the patch.",
|
||||
"The same skill is reported disabled after the restart triggered by the patch."
|
||||
],
|
||||
"docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
|
||||
"codeRefs": ["src/gateway/server-methods/config.ts", "extensions/qa-lab/src/suite.ts"]
|
||||
},
|
||||
{
|
||||
"id": "config-apply-restart-wakeup",
|
||||
"title": "Config apply restart wake-up",
|
||||
"surface": "config",
|
||||
"objective": "Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.",
|
||||
"successCriteria": [
|
||||
"config.apply schedules a restart-required change.",
|
||||
"Gateway becomes healthy again after restart.",
|
||||
"Restart sentinel wake-up message arrives in the QA channel."
|
||||
],
|
||||
"docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
|
||||
"codeRefs": ["src/gateway/server-methods/config.ts", "src/gateway/server-restart-sentinel.ts"]
|
||||
},
|
||||
{
|
||||
"id": "config-restart-capability-flip",
|
||||
"title": "Config restart capability flip",
|
||||
"surface": "config",
|
||||
"objective": "Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.",
|
||||
"successCriteria": [
|
||||
"Capability is absent before the restart-triggering patch.",
|
||||
"Restart sentinel wakes the same session back up after config patch.",
|
||||
"The restored capability appears in tools.effective and works in the follow-up turn."
|
||||
],
|
||||
"docsRefs": [
|
||||
"docs/gateway/configuration.md",
|
||||
"docs/gateway/protocol.md",
|
||||
"docs/tools/image-generation.md"
|
||||
],
|
||||
"codeRefs": [
|
||||
"src/gateway/server-methods/config.ts",
|
||||
"src/gateway/server-restart-sentinel.ts",
|
||||
"src/gateway/server-methods/tools-effective.ts",
|
||||
"extensions/qa-lab/src/suite.ts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "runtime-inventory-drift-check",
|
||||
"title": "Runtime inventory drift check",
|
||||
"surface": "inventory",
|
||||
"objective": "Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.",
|
||||
"successCriteria": [
|
||||
"Enabled tool appears before the config change.",
|
||||
"After config change, disabled tool disappears from tools.effective.",
|
||||
"Disabled skill appears in skills.status with disabled state."
|
||||
],
|
||||
"docsRefs": ["docs/gateway/protocol.md", "docs/tools/skills.md", "docs/tools/index.md"],
|
||||
"codeRefs": [
|
||||
"src/gateway/server-methods/tools-effective.ts",
|
||||
"src/gateway/server-methods/skills.ts"
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -20,6 +20,7 @@ export {
|
||||
setQaChannelRuntime,
|
||||
} from "../../extensions/qa-channel/api.js";
|
||||
export type {
|
||||
QaBusAttachment,
|
||||
QaBusConversation,
|
||||
QaBusConversationKind,
|
||||
QaBusCreateThreadInput,
|
||||
|
||||
Reference in New Issue
Block a user