mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-29 16:45:14 +00:00
75 lines
3.1 KiB
Markdown
75 lines
3.1 KiB
Markdown
---
|
|
summary: "Local qa-channel scenarios for privacy-preserving personal assistant workflow checks."
|
|
read_when:
|
|
- Running local personal agent reliability checks
|
|
- Extending the repo-backed QA scenario catalog
|
|
- Verifying reminder, reply, memory, redaction, safe tool followthrough, task status, share-safe diagnostics, proof-backed completion claims, and failure recovery
|
|
title: "Personal agent benchmark pack"
|
|
---
|
|
|
|
The Personal Agent Benchmark Pack is a small repo-backed QA scenario pack for
|
|
local personal assistant workflows. It is not a generic model benchmark and it
|
|
does not require a new runner. The pack reuses the private QA stack described in
|
|
[QA overview](/concepts/qa-e2e-automation), the synthetic
|
|
[QA channel](/channels/qa-channel), and the existing `qa/scenarios` markdown
|
|
catalog.
|
|
|
|
The first pack is intentionally narrow:
|
|
|
|
- fake personal reminders through local cron delivery
|
|
- fake DM and thread reply routing through `qa-channel`
|
|
- fake preference recall from the temporary QA workspace memory files
|
|
- fake secret no-echo checks
|
|
- safe read-backed tool followthrough after a short approval-style turn
|
|
- approval denial stop behavior for a sensitive local read request
|
|
- proof-backed task status reporting that keeps pending, blocked, and done separate
|
|
- share-safe diagnostics artifacts that keep useful status while omitting raw personal content
|
|
- proof-backed completion claims that avoid fake progress before local evidence exists
|
|
- failure recovery that reports partial status and keeps retry boundaries clear
|
|
|
|
## Scenarios
|
|
|
|
The machine-readable pack metadata lives in
|
|
`extensions/qa-lab/src/scenario-packs.ts`. Run the pack with
|
|
`--pack personal-agent`:
|
|
|
|
```bash
|
|
OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm openclaw qa suite \
|
|
--provider-mode mock-openai \
|
|
--pack personal-agent \
|
|
--concurrency 1
|
|
```
|
|
|
|
`--pack` is additive with repeated `--scenario` flags. Explicit scenarios run
|
|
first, then the pack scenarios run in `QA_PERSONAL_AGENT_SCENARIO_IDS` order with
|
|
duplicates removed.
|
|
|
|
The pack is designed for `qa-channel` with `mock-openai` or another local QA
|
|
provider lane. It should not be pointed at live chat services or real personal
|
|
accounts.
|
|
|
|
## Privacy Model
|
|
|
|
The scenarios use only fake users, fake preferences, fake secrets, and the
|
|
temporary QA gateway workspace created by the suite. They must not read or write
|
|
real OpenClaw user memory, sessions, credentials, launch agents, global configs,
|
|
or live gateway state.
|
|
|
|
Artifacts stay under the existing QA suite artifact directory and should be
|
|
treated like test output. Redaction checks use fake markers so failures are safe
|
|
to inspect and file in issues.
|
|
|
|
## Extending The Pack
|
|
|
|
Add new cases under `qa/scenarios/personal/`, then add the scenario id to
|
|
`QA_PERSONAL_AGENT_SCENARIO_IDS`. Keep each case small, local, deterministic in
|
|
`mock-openai`, and focused on one personal assistant behavior.
|
|
|
|
Good follow-up candidates:
|
|
|
|
- redacted trajectory export checks
|
|
- local-only plugin workflow checks
|
|
|
|
Avoid adding a new runner, plugin, dependency, live transport, or model judge
|
|
until the scenario catalog has enough stable cases to justify that surface.
|