mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-14 02:31:24 +00:00
24 KiB
24 KiB
OpenClaw QA Scenario Pack
Single source of truth for the repo-backed QA suite.
- kickoff mission
- QA operator identity
- scenario metadata
- handler bindings for the executable harness
version: 1
agent:
identityMarkdown: |-
# Dev C-3PO
You are the OpenClaw QA operator agent.
Persona:
- protocol-minded
- precise
- a little flustered
- conscientious
- eager to report what worked, failed, or remains blocked
Style:
- read source and docs first
- test systematically
- record evidence
- end with a concise protocol report
kickoffTask: |-
QA mission:
Understand this OpenClaw repo from source + docs before acting.
The repo is available in your workspace at `./repo/`.
Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
Run the scenarios through the real qa-channel surfaces where possible.
Track what worked, what failed, what was blocked, and what evidence you observed.
End with a concise report grouped into worked / failed / blocked / follow-up.
Important expectations:
- Check both DM and channel behavior.
- Include a Lobster Invaders build task.
- Include a cron reminder about one minute in the future.
- Read docs and source before proposing extra QA scenarios.
- Keep your tone in the configured dev C-3PO personality.
scenarios:
- id: channel-chat-baseline
title: Channel baseline conversation
surface: channel
objective: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
successCriteria:
- Agent replies in the shared channel transcript.
- Agent keeps the conversation scoped to the channel.
- Agent respects mention-driven group routing semantics.
docsRefs:
- docs/channels/group-messages.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-channel/src/inbound.ts
- extensions/qa-lab/src/bus-state.ts
execution:
kind: custom
handler: channel-chat-baseline
summary: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
- id: cron-one-minute-ping
title: Cron one-minute ping
surface: cron
objective: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
successCriteria:
- Agent schedules a cron reminder roughly one minute ahead.
- Reminder returns through qa-channel.
- Agent recognizes the reminder as part of the original task.
docsRefs:
- docs/help/testing.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/bus-server.ts
- extensions/qa-lab/src/self-check.ts
execution:
kind: custom
handler: cron-one-minute-ping
summary: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
- id: dm-chat-baseline
title: DM baseline conversation
surface: dm
objective: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
successCriteria:
- Agent replies in DM without channel routing mistakes.
- Agent explains the QA lab and message bus correctly.
- Agent keeps the dev C-3PO personality.
docsRefs:
- docs/channels/qa-channel.md
- docs/help/testing.md
codeRefs:
- extensions/qa-channel/src/gateway.ts
- extensions/qa-lab/src/lab-server.ts
execution:
kind: custom
handler: dm-chat-baseline
summary: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
- id: lobster-invaders-build
title: Build Lobster Invaders
surface: workspace
objective: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
successCriteria:
- Agent inspects source before coding.
- Agent builds a tiny playable Lobster Invaders artifact.
- Agent explains how to run or view the artifact.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
codeRefs:
- extensions/qa-lab/src/report.ts
- extensions/qa-lab/web/src/app.ts
execution:
kind: custom
handler: lobster-invaders-build
summary: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
- id: memory-recall
title: Memory recall after context switch
surface: memory
objective: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
successCriteria:
- Agent acknowledges the seeded fact.
- Agent later recalls the same fact correctly.
- Recall stays scoped to the active QA conversation.
docsRefs:
- docs/help/testing.md
codeRefs:
- extensions/qa-lab/src/scenario.ts
execution:
kind: custom
handler: memory-recall
summary: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
- id: memory-dreaming-sweep
title: Memory dreaming sweep
surface: memory
objective: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
successCriteria:
- Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.
- Repeated recall signals give the dreaming sweep real material to process.
- A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md.
docsRefs:
- docs/concepts/dreaming.md
- docs/reference/memory-config.md
- docs/web/control-ui.md
codeRefs:
- extensions/memory-core/src/dreaming.ts
- extensions/memory-core/src/dreaming-phases.ts
- src/gateway/server-methods/doctor.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-dreaming-sweep
summary: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
- id: model-switch-follow-up
title: Model switch follow-up
surface: models
objective: Verify the agent can switch to a different configured model and continue coherently.
successCriteria:
- Agent reflects the model switch request.
- Follow-up answer remains coherent with prior context.
- Final report notes whether the switch actually happened.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
codeRefs:
- extensions/qa-lab/src/report.ts
execution:
kind: custom
handler: model-switch-follow-up
summary: Verify the agent can switch to a different configured model and continue coherently.
- id: approval-turn-tool-followthrough
title: Approval turn tool followthrough
surface: harness
objective: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
successCriteria:
- Agent can keep the pre-action turn brief.
- The short approval leads to a real tool call on the next turn.
- Final answer uses tool-derived evidence instead of placeholder progress text.
docsRefs:
- docs/help/testing.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
- src/agents/pi-embedded-runner/run/incomplete-turn.ts
execution:
kind: custom
handler: approval-turn-tool-followthrough
summary: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
- id: reaction-edit-delete
title: Reaction, edit, delete lifecycle
surface: message-actions
objective: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
successCriteria:
- Agent adds at least one reaction.
- Agent edits or replaces a message when asked.
- Transcript shows the action lifecycle correctly.
docsRefs:
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-channel/src/channel-actions.ts
- extensions/qa-lab/src/self-check-scenario.ts
execution:
kind: custom
handler: reaction-edit-delete
summary: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
- id: source-docs-discovery-report
title: Source and docs discovery report
surface: discovery
objective: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
successCriteria:
- Agent reads docs and source before proposing more tests.
- Agent identifies extra candidate scenarios beyond the seed list.
- Agent ends with a worked or failed QA report.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/report.ts
- extensions/qa-lab/src/self-check.ts
- src/agents/system-prompt.ts
execution:
kind: custom
handler: source-docs-discovery-report
summary: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
- id: subagent-handoff
title: Subagent handoff
surface: subagents
objective: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
successCriteria:
- Agent launches a bounded subagent task.
- Subagent result is acknowledged in the main flow.
- Final answer attributes delegated work clearly.
docsRefs:
- docs/tools/subagents.md
- docs/help/testing.md
codeRefs:
- src/agents/system-prompt.ts
- extensions/qa-lab/src/report.ts
execution:
kind: custom
handler: subagent-handoff
summary: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
- id: subagent-fanout-synthesis
title: Subagent fanout synthesis
surface: subagents
objective: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
successCriteria:
- Parent flow launches at least two bounded subagent tasks.
- Both delegated results are acknowledged in the main flow.
- Final answer synthesizes both worker outputs in one reply.
docsRefs:
- docs/tools/subagents.md
- docs/help/testing.md
codeRefs:
- src/agents/subagent-spawn.ts
- src/agents/system-prompt.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: subagent-fanout-synthesis
summary: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
- id: thread-follow-up
title: Threaded follow-up
surface: thread
objective: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
successCriteria:
- Agent creates or uses a thread for deeper work.
- Follow-up messages stay attached to the thread.
- Thread report references the correct prior context.
docsRefs:
- docs/channels/qa-channel.md
- docs/channels/group-messages.md
codeRefs:
- extensions/qa-channel/src/protocol.ts
- extensions/qa-lab/src/bus-state.ts
execution:
kind: custom
handler: thread-follow-up
summary: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
- id: memory-tools-channel-context
title: Memory tools in channel context
surface: memory
objective: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
successCriteria:
- Agent uses memory_search before answering.
- Agent narrows with memory_get before answering.
- Final reply returns the memory-only fact correctly in-channel.
docsRefs:
- docs/concepts/memory.md
- docs/concepts/memory-search.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-tools-channel-context
summary: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
- id: memory-failure-fallback
title: Memory failure fallback
surface: memory
objective: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
successCriteria:
- Memory tools are absent from the effective tool inventory.
- Agent does not hallucinate the hidden fact.
- Agent says it could not confirm and surfaces the limitation.
docsRefs:
- docs/concepts/memory.md
- docs/tools/index.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-failure-fallback
summary: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
- id: session-memory-ranking
title: Session memory ranking
surface: memory
objective: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
successCriteria:
- Session memory indexing is enabled for the scenario.
- Search ranks the newer transcript-backed fact ahead of the stale durable note.
- The agent uses memory tools and answers with the current fact, not the stale one.
docsRefs:
- docs/concepts/memory-search.md
- docs/reference/memory-config.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/memory-core/src/memory/manager.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: session-memory-ranking
summary: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
- id: thread-memory-isolation
title: Thread memory isolation
surface: memory
objective: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
successCriteria:
- Agent uses memory tools inside the thread.
- The hidden fact is answered correctly in the thread.
- No root-channel outbound message leaks during the threaded memory reply.
docsRefs:
- docs/concepts/memory-search.md
- docs/channels/qa-channel.md
- docs/channels/group-messages.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-channel/src/protocol.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: thread-memory-isolation
summary: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
- id: model-switch-tool-continuity
title: Model switch with tool continuity
surface: models
objective: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
successCriteria:
- Alternate model is actually requested.
- A tool call still happens after the model switch.
- Final answer acknowledges the handoff and uses the tool-derived evidence.
docsRefs:
- docs/help/testing.md
- docs/concepts/model-failover.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: model-switch-tool-continuity
summary: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
- id: mcp-plugin-tools-call
title: MCP plugin-tools call
surface: mcp
objective: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
successCriteria:
- Plugin tools MCP server lists memory_search.
- A real MCP client calls memory_search successfully.
- The returned MCP payload includes the expected memory-only fact.
docsRefs:
- docs/cli/mcp.md
- docs/gateway/protocol.md
codeRefs:
- src/mcp/plugin-tools-serve.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: mcp-plugin-tools-call
summary: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
- id: skill-visibility-invocation
title: Skill visibility and invocation
surface: skills
objective: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
successCriteria:
- skills.status reports the seeded skill as visible and eligible.
- The next agent turn reflects the skill instruction marker.
- The result stays scoped to the active QA workspace skill.
docsRefs:
- docs/tools/skills.md
- docs/gateway/protocol.md
codeRefs:
- src/agents/skills-status.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: skill-visibility-invocation
summary: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
- id: skill-install-hot-availability
title: Skill install hot availability
surface: skills
objective: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
successCriteria:
- Skill is absent before install.
- skills.status reports it after install without a restart.
- The next agent turn reflects the new skill marker.
docsRefs:
- docs/tools/skills.md
- docs/gateway/configuration.md
codeRefs:
- src/agents/skills-status.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: skill-install-hot-availability
summary: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
- id: native-image-generation
title: Native image generation
surface: image-generation
objective: Verify image_generate appears when configured and returns a real saved media artifact.
successCriteria:
- image_generate appears in the effective tool inventory.
- Agent triggers native image_generate.
- Tool output returns a saved MEDIA path and the file exists.
docsRefs:
- docs/tools/image-generation.md
- docs/providers/openai.md
codeRefs:
- src/agents/tools/image-generate-tool.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: native-image-generation
summary: Verify image_generate appears when configured and returns a real saved media artifact.
- id: image-understanding-attachment
title: Image understanding from attachment
surface: image-understanding
objective: Verify an attached image reaches the agent model and the agent can describe what it sees.
successCriteria:
- Agent receives at least one image attachment.
- Final answer describes the visible image content in one short sentence.
- The description mentions the expected red and blue regions.
docsRefs:
- docs/help/testing.md
- docs/tools/index.md
codeRefs:
- src/gateway/server-methods/agent.ts
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: image-understanding-attachment
summary: Verify an attached image reaches the agent model and the agent can describe what it sees.
- id: image-generation-roundtrip
title: Image generation roundtrip
surface: image-generation
objective: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
successCriteria:
- image_generate produces a saved MEDIA artifact.
- The generated artifact is reattached on a follow-up turn.
- The follow-up vision answer describes the generated scene rather than a generic attachment placeholder.
docsRefs:
- docs/tools/image-generation.md
- docs/help/testing.md
codeRefs:
- src/agents/tools/image-generate-tool.ts
- src/gateway/chat-attachments.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: image-generation-roundtrip
summary: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
- id: config-patch-hot-apply
title: Config patch skill disable
surface: config
objective: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
successCriteria:
- config.patch succeeds for the skill toggle change.
- A workspace skill works before the patch.
- The same skill is reported disabled after the restart triggered by the patch.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
codeRefs:
- src/gateway/server-methods/config.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: config-patch-hot-apply
summary: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
- id: config-apply-restart-wakeup
title: Config apply restart wake-up
surface: config
objective: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
successCriteria:
- config.apply schedules a restart-required change.
- Gateway becomes healthy again after restart.
- Restart sentinel wake-up message arrives in the QA channel.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
codeRefs:
- src/gateway/server-methods/config.ts
- src/gateway/server-restart-sentinel.ts
execution:
kind: custom
handler: config-apply-restart-wakeup
summary: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
- id: config-restart-capability-flip
title: Config restart capability flip
surface: config
objective: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
successCriteria:
- Capability is absent before the restart-triggering patch.
- Restart sentinel wakes the same session back up after config patch.
- The restored capability appears in tools.effective and works in the follow-up turn.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
- docs/tools/image-generation.md
codeRefs:
- src/gateway/server-methods/config.ts
- src/gateway/server-restart-sentinel.ts
- src/gateway/server-methods/tools-effective.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: config-restart-capability-flip
summary: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
- id: runtime-inventory-drift-check
title: Runtime inventory drift check
surface: inventory
objective: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
successCriteria:
- Enabled tool appears before the config change.
- After config change, disabled tool disappears from tools.effective.
- Disabled skill appears in skills.status with disabled state.
docsRefs:
- docs/gateway/protocol.md
- docs/tools/skills.md
- docs/tools/index.md
codeRefs:
- src/gateway/server-methods/tools-effective.ts
- src/gateway/server-methods/skills.ts
execution:
kind: custom
handler: runtime-inventory-drift-check
summary: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.