Merge branch 'main' into meow/fix-76957-control-ui-new-hooks

This commit is contained in:
Val Alexander
2026-05-05 21:51:20 -05:00
committed by GitHub
19 changed files with 323 additions and 384 deletions

View File

@@ -393,7 +393,8 @@ Docs: https://docs.openclaw.ai
- Agents/sessions: after embedded Pi runs, append assistant-visible reply text to session JSONL only when Pi did not already persist an equivalent tail assistant entry, without re-mirroring the user prompt Pi owns. Fixes #77823. (#77839) Thanks @neeravmakwana.
- Plugins/CLI: load the install-records ledger when listing channel-catalog entries, so npm-installed third-party channel plugins resolve through `openclaw channels login`/`channels add` instead of failing with `Unsupported channel`. (#77269) Thanks @pumpkinxing1.
- Memory wiki/Security: enforce session visibility on shared-memory `wiki_search` and `wiki_get` so sandboxed subagents cannot read transcript content from sibling or parent sessions. Fixes GHSA-72fw-cqh5-f324. Thanks @zsxsoft.
- Exec approvals: enforce allowlist `argPattern` argument restrictions on Linux and macOS as well as Windows, so an entry like `{ pattern: "python3", argPattern: "^safe\\.py$" }` no longer silently relaxes to a path-only match on non-Windows hosts. (#75143) Thanks @eleqtrizit.
- Exec approvals: enforce allowlist `argPattern` argument restrictions on Linux and macOS as well as Windows, so an entry like `{ pattern: "python3", argPattern: "^safe\.py$" }` no longer silently relaxes to a path-only match on non-Windows hosts. (#75143) Thanks @eleqtrizit.
- Agents/compaction: disable Pi auto-compaction whenever OpenClaw effectively owns safeguard compaction, including provider-backed safeguard mode, so Pi and OpenClaw no longer fight over long-session compaction. Fixes #73003. (#73839) Thanks @bradhallett.
## 2026.5.3-1

View File

@@ -14,7 +14,7 @@ Looking for scheduling? See [Automation and tasks](/automation) for choosing the
Background tasks track work that runs **outside your main conversation session**: ACP runs, subagent spawns, isolated cron job executions, and CLI-initiated operations.
Tasks do **not** replace sessions, cron jobs, or heartbeats they are the **activity ledger** that records what detached work happened, when, and whether it succeeded.
Tasks do **not** replace sessions, cron jobs, or heartbeats - they are the **activity ledger** that records what detached work happened, when, and whether it succeeded.
<Note>
Not every agent run creates a task. Heartbeat turns and normal interactive chat do not. All cron executions, ACP spawns, subagent spawns, and CLI agent commands do.
@@ -22,7 +22,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat
## TL;DR
- Tasks are **records**, not schedulers cron and heartbeat decide _when_ work runs, tasks track _what happened_.
- Tasks are **records**, not schedulers - cron and heartbeat decide _when_ work runs, tasks track _what happened_.
- ACP, subagents, all cron jobs, and CLI operations create tasks. Heartbeat turns do not.
- Each task moves through `queued → running → terminal` (succeeded, failed, timed_out, cancelled, or lost).
- Cron tasks stay live while the cron runtime still owns the job; if the
@@ -100,7 +100,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat
<AccordionGroup>
<Accordion title="Notify defaults for cron and media">
Main-session cron tasks use `silent` notify policy by default they create records for tracking but do not generate notifications. Isolated cron tasks also default to `silent` but are more visible because they run in their own session.
Main-session cron tasks use `silent` notify policy by default - they create records for tracking but do not generate notifications. Isolated cron tasks also default to `silent` but are more visible because they run in their own session.
Session-backed `music_generate` and `video_generate` runs also use `silent` notify policy. They still create task records, but completion is handed back to the original agent session as an internal wake so the agent can write the follow-up message and attach the finished media itself. Group/channel completions follow the normal visible-reply policy, so the agent uses the message tool when source delivery requires it. If the completion agent fails to produce message-tool delivery evidence in a tool-only route, OpenClaw sends the completion fallback directly to the original channel instead of leaving the media private.
@@ -109,7 +109,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat
While a session-backed `video_generate` task is still active, the tool also acts as a guardrail: repeated `video_generate` calls in that same session return the active task status instead of starting a second concurrent generation. Use `action: "status"` when you want an explicit progress/status lookup from the agent side.
</Accordion>
<Accordion title="What does not create tasks">
- Heartbeat turns main-session; see [Heartbeat](/gateway/heartbeat)
- Heartbeat turns - main-session; see [Heartbeat](/gateway/heartbeat)
- Normal interactive chat turns
- Direct `/command` responses
@@ -140,7 +140,7 @@ stateDiagram-v2
| `cancelled` | Stopped by the operator via `openclaw tasks cancel` |
| `lost` | The runtime lost authoritative backing state after a 5-minute grace period |
Transitions happen automatically when the associated agent run ends, the task status updates to match.
Transitions happen automatically - when the associated agent run ends, the task status updates to match.
Agent run completion is authoritative for active task records. A successful detached run finalizes as `succeeded`, ordinary run errors finalize as `failed`, and timeout or abort outcomes finalize as `timed_out`. If an operator already cancelled the task, or the runtime already recorded a stronger terminal state such as `failed`, `timed_out`, or `lost`, a later success signal does not downgrade that terminal status.
@@ -161,12 +161,12 @@ Agent run completion is authoritative for active task records. A successful deta
When a task reaches a terminal state, OpenClaw notifies you. There are two delivery paths:
**Direct delivery** if the task has a channel target (the `requesterOrigin`), the completion message goes straight to that channel (Telegram, Discord, Slack, etc.). For subagent completions, OpenClaw also preserves bound thread/topic routing when available and can fill a missing `to` / account from the requester session's stored route (`lastChannel` / `lastTo` / `lastAccountId`) before giving up on direct delivery.
**Direct delivery** - if the task has a channel target (the `requesterOrigin`), the completion message goes straight to that channel (Telegram, Discord, Slack, etc.). For subagent completions, OpenClaw also preserves bound thread/topic routing when available and can fill a missing `to` / account from the requester session's stored route (`lastChannel` / `lastTo` / `lastAccountId`) before giving up on direct delivery.
**Session-queued delivery** if direct delivery fails or no origin is set, the update is queued as a system event in the requester's session and surfaces on the next heartbeat.
**Session-queued delivery** - if direct delivery fails or no origin is set, the update is queued as a system event in the requester's session and surfaces on the next heartbeat.
<Tip>
Task completion triggers an immediate heartbeat wake so you see the result quickly you do not have to wait for the next scheduled heartbeat tick.
Task completion triggers an immediate heartbeat wake so you see the result quickly - you do not have to wait for the next scheduled heartbeat tick.
</Tip>
That means the usual workflow is push-based: start detached work once, then let the runtime wake or notify you on completion. Poll task state only when you need debugging, intervention, or an explicit audit.
@@ -177,7 +177,7 @@ Control how much you hear about each task:
| Policy | What is delivered |
| --------------------- | ----------------------------------------------------------------------- |
| `done_only` (default) | Only terminal state (succeeded, failed, etc.) **this is the default** |
| `done_only` (default) | Only terminal state (succeeded, failed, etc.) - **this is the default** |
| `state_changes` | Every state transition and progress update |
| `silent` | Nothing at all |
@@ -290,9 +290,9 @@ Tasks: 3 queued · 2 running · 1 issues
The summary reports:
- **active** count of `queued` + `running`
- **failures** count of `failed` + `timed_out` + `lost`
- **byRuntime** breakdown by `acp`, `subagent`, `cron`, `cli`
- **active** - count of `queued` + `running`
- **failures** - count of `failed` + `timed_out` + `lost`
- **byRuntime** - breakdown by `acp`, `subagent`, `cron`, `cli`
Both `/status` and the `session_status` tool use a cleanup-aware task snapshot: active tasks are preferred, stale completed rows are hidden, and recent failures only surface when no active work remains. This keeps the status card focused on what matters right now.
@@ -343,13 +343,13 @@ A sweeper runs every **60 seconds** and handles four things:
</Accordion>
<Accordion title="Tasks and cron">
A cron job **definition** lives in `~/.openclaw/cron/jobs.json`; runtime execution state lives beside it in `~/.openclaw/cron/jobs-state.json`. **Every** cron execution creates a task record both main-session and isolated. Main-session cron tasks default to `silent` notify policy so they track without generating notifications.
A cron job **definition** lives in `~/.openclaw/cron/jobs.json`; runtime execution state lives beside it in `~/.openclaw/cron/jobs-state.json`. **Every** cron execution creates a task record - both main-session and isolated. Main-session cron tasks default to `silent` notify policy so they track without generating notifications.
See [Cron Jobs](/automation/cron-jobs).
</Accordion>
<Accordion title="Tasks and heartbeat">
Heartbeat runs are main-session turns they do not create task records. When a task completes, it can trigger a heartbeat wake so you see the result promptly.
Heartbeat runs are main-session turns - they do not create task records. When a task completes, it can trigger a heartbeat wake so you see the result promptly.
See [Heartbeat](/gateway/heartbeat).
@@ -358,14 +358,14 @@ A sweeper runs every **60 seconds** and handles four things:
A task may reference a `childSessionKey` (where work runs) and a `requesterSessionKey` (who started it). Sessions are conversation context; tasks are activity tracking on top of that.
</Accordion>
<Accordion title="Tasks and agent runs">
A task's `runId` links to the agent run doing the work. Agent lifecycle events (start, end, error) automatically update the task status you do not need to manage the lifecycle manually.
A task's `runId` links to the agent run doing the work. Agent lifecycle events (start, end, error) automatically update the task status - you do not need to manage the lifecycle manually.
</Accordion>
</AccordionGroup>
## Related
- [Automation & Tasks](/automation) all automation mechanisms at a glance
- [CLI: Tasks](/cli/tasks) CLI command reference
- [Heartbeat](/gateway/heartbeat) periodic main-session turns
- [Scheduled Tasks](/automation/cron-jobs) scheduling background work
- [Task Flow](/automation/taskflow) flow orchestration above tasks
- [Automation & Tasks](/automation) - all automation mechanisms at a glance
- [CLI: Tasks](/cli/tasks) - CLI command reference
- [Heartbeat](/gateway/heartbeat) - periodic main-session turns
- [Scheduled Tasks](/automation/cron-jobs) - scheduling background work
- [Task Flow](/automation/taskflow) - flow orchestration above tasks

View File

@@ -6,8 +6,6 @@ read_when:
title: Yuanbao
---
# Yuanbao
Tencent Yuanbao is Tencent's AI assistant platform. The OpenClaw channel plugin
connects Yuanbao bots to OpenClaw over WebSocket so they can interact with users
through direct messages and group chats.
@@ -53,10 +51,10 @@ Follow the prompts to enter your App ID and App Secret.
Configure `dmPolicy` to control who can DM the bot:
- `"pairing"` unknown users receive a pairing code; approve via CLI
- `"allowlist"` only users listed in `allowFrom` can chat
- `"open"` allow all users (default)
- `"disabled"` disable all DMs
- `"pairing"` - unknown users receive a pairing code; approve via CLI
- `"allowlist"` - only users listed in `allowFrom` can chat
- `"open"` - allow all users (default)
- `"disabled"` - disable all DMs
**Approve a pairing request:**
@@ -69,8 +67,8 @@ openclaw pairing approve yuanbao <CODE>
**Mention requirement** (`channels.yuanbao.requireMention`):
- `true` require @mention (default)
- `false` respond without @mention
- `true` - require @mention (default)
- `false` - respond without @mention
Replying to the bot's message in a group chat is treated as an implicit mention.
@@ -228,9 +226,9 @@ Replying to the bot's message in a group chat is treated as an implicit mention.
### Message limits
- `maxChars` single message max character count (default: `3000` chars)
- `mediaMaxMb` media upload/download limit (default: `20` MB)
- `overflowPolicy` behavior when message exceeds limit: `"split"` (default) or `"stop"`
- `maxChars` - single message max character count (default: `3000` chars)
- `mediaMaxMb` - media upload/download limit (default: `20` MB)
- `overflowPolicy` - behavior when message exceeds limit: `"split"` (default) or `"stop"`
### Streaming
@@ -358,13 +356,13 @@ Full configuration: [Gateway configuration](/gateway/configuration)
| ------------------------------------------ | ------------------------------------------------- | -------------------------------------- |
| `channels.yuanbao.enabled` | Enable/disable the channel | `true` |
| `channels.yuanbao.defaultAccount` | Default account for outbound routing | `default` |
| `channels.yuanbao.accounts.<id>.appKey` | App Key (used for signing and ticket generation) | |
| `channels.yuanbao.accounts.<id>.appSecret` | App Secret (used for signing) | |
| `channels.yuanbao.accounts.<id>.token` | Pre-signed token (skips automatic ticket signing) | |
| `channels.yuanbao.accounts.<id>.name` | Account display name | |
| `channels.yuanbao.accounts.<id>.appKey` | App Key (used for signing and ticket generation) | - |
| `channels.yuanbao.accounts.<id>.appSecret` | App Secret (used for signing) | - |
| `channels.yuanbao.accounts.<id>.token` | Pre-signed token (skips automatic ticket signing) | - |
| `channels.yuanbao.accounts.<id>.name` | Account display name | - |
| `channels.yuanbao.accounts.<id>.enabled` | Enable/disable a specific account | `true` |
| `channels.yuanbao.dm.policy` | DM policy | `open` |
| `channels.yuanbao.dm.allowFrom` | DM allowlist (user ID list) | |
| `channels.yuanbao.dm.allowFrom` | DM allowlist (user ID list) | - |
| `channels.yuanbao.requireMention` | Require @mention in groups | `true` |
| `channels.yuanbao.overflowPolicy` | Long message handling (`split` or `stop`) | `split` |
| `channels.yuanbao.replyToMode` | Group reply-to strategy (`off`, `first`, `all`) | `first` |
@@ -411,8 +409,8 @@ Full configuration: [Gateway configuration](/gateway/configuration)
## Related
- [Channels Overview](/channels) all supported channels
- [Pairing](/channels/pairing) DM authentication and pairing flow
- [Groups](/channels/groups) group chat behavior and mention gating
- [Channel Routing](/channels/channel-routing) session routing for messages
- [Security](/gateway/security) access model and hardening
- [Channels Overview](/channels) - all supported channels
- [Pairing](/channels/pairing) - DM authentication and pairing flow
- [Groups](/channels/groups) - group chat behavior and mention gating
- [Channel Routing](/channels/channel-routing) - session routing for messages
- [Security](/gateway/security) - access model and hardening

View File

@@ -240,7 +240,7 @@ can write back through the mounted workspace.
## Telegram, Discord, and Slack QA reference
Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, and Slack are smaller a handful of scenarios each, no profile system, against pre-existing real channels so their reference lives here.
Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, and Slack are smaller - a handful of scenarios each, no profile system, against pre-existing real channels - so their reference lives here.
### Shared CLI flags
@@ -248,7 +248,7 @@ These lanes register through `extensions/qa-lab/src/live-transports/shared/live-
| Flag | Default | Description |
| ------------------------------------- | --------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| `--scenario <id>` | | Run only this scenario. Repeatable. |
| `--scenario <id>` | - | Run only this scenario. Repeatable. |
| `--output-dir <path>` | `<repo>/.artifacts/qa-e2e/{telegram,discord,slack}-<timestamp>` | Where reports/summary/observed messages and the output log are written. Relative paths resolve against `--repo-root`. |
| `--repo-root <path>` | `process.cwd()` | Repository root when invoking from a neutral cwd. |
| `--sut-account <id>` | `sut` | Temporary account id inside the QA gateway config. |
@@ -270,7 +270,7 @@ Targets one real private Telegram group with two distinct bots (driver + SUT). T
Required env when `--credential-source env`:
- `OPENCLAW_QA_TELEGRAM_GROUP_ID` numeric chat id (string).
- `OPENCLAW_QA_TELEGRAM_GROUP_ID` - numeric chat id (string).
- `OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN`
- `OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN`
@@ -294,8 +294,8 @@ Scenarios (`extensions/qa-lab/src/live-transports/telegram/telegram-live.runtime
Output artifacts:
- `telegram-qa-report.md`
- `telegram-qa-summary.json` includes per-reply RTT (driver send → observed SUT reply) starting with the canary.
- `telegram-qa-observed-messages.json` bodies redacted unless `OPENCLAW_QA_TELEGRAM_CAPTURE_CONTENT=1`.
- `telegram-qa-summary.json` - includes per-reply RTT (driver send → observed SUT reply) starting with the canary.
- `telegram-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_TELEGRAM_CAPTURE_CONTENT=1`.
### Discord QA
@@ -311,7 +311,7 @@ Required env when `--credential-source env`:
- `OPENCLAW_QA_DISCORD_CHANNEL_ID`
- `OPENCLAW_QA_DISCORD_DRIVER_BOT_TOKEN`
- `OPENCLAW_QA_DISCORD_SUT_BOT_TOKEN`
- `OPENCLAW_QA_DISCORD_SUT_APPLICATION_ID` must match the SUT bot user id returned by Discord (the lane fails fast otherwise).
- `OPENCLAW_QA_DISCORD_SUT_APPLICATION_ID` - must match the SUT bot user id returned by Discord (the lane fails fast otherwise).
Optional:
@@ -322,7 +322,7 @@ Scenarios (`extensions/qa-lab/src/live-transports/discord/discord-live.runtime.t
- `discord-canary`
- `discord-mention-gating`
- `discord-native-help-command-registration`
- `discord-status-reactions-tool-only` opt-in Mantis scenario. Runs by itself because it switches the SUT to always-on, tool-only guild replies with `messages.statusReactions.enabled=true`, then captures a REST reaction timeline plus HTML/PNG visual artifacts. Mantis before/after reports also preserve scenario-provided MP4 artifacts as `baseline.mp4` and `candidate.mp4`.
- `discord-status-reactions-tool-only` - opt-in Mantis scenario. Runs by itself because it switches the SUT to always-on, tool-only guild replies with `messages.statusReactions.enabled=true`, then captures a REST reaction timeline plus HTML/PNG visual artifacts. Mantis before/after reports also preserve scenario-provided MP4 artifacts as `baseline.mp4` and `candidate.mp4`.
Run the Mantis status-reaction scenario explicitly:
@@ -339,7 +339,7 @@ Output artifacts:
- `discord-qa-report.md`
- `discord-qa-summary.json`
- `discord-qa-observed-messages.json` bodies redacted unless `OPENCLAW_QA_DISCORD_CAPTURE_CONTENT=1`.
- `discord-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_DISCORD_CAPTURE_CONTENT=1`.
- `discord-qa-reaction-timelines.json` and `discord-status-reactions-tool-only-timeline.png` when the status-reaction scenario runs.
### Slack QA
@@ -375,16 +375,16 @@ Output artifacts:
- `slack-qa-report.md`
- `slack-qa-summary.json`
- `slack-qa-observed-messages.json` bodies redacted unless `OPENCLAW_QA_SLACK_CAPTURE_CONTENT=1`.
- `slack-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_SLACK_CAPTURE_CONTENT=1`.
#### Setting up the Slack workspace
The lane needs two distinct Slack apps in one workspace, plus a channel both bots are members of:
- `channelId` the `Cxxxxxxxxxx` id of a channel both bots have been invited to. Use a dedicated channel; the lane posts on every run.
- `driverBotToken` bot token (`xoxb-...`) of the **Driver** app.
- `sutBotToken` bot token (`xoxb-...`) of the **SUT** app, which must be a separate Slack app from the driver so its bot user id is distinct.
- `sutAppToken` app-level token (`xapp-...`) of the SUT app with `connections:write`, used by Socket Mode so the SUT app can receive events.
- `channelId` - the `Cxxxxxxxxxx` id of a channel both bots have been invited to. Use a dedicated channel; the lane posts on every run.
- `driverBotToken` - bot token (`xoxb-...`) of the **Driver** app.
- `sutBotToken` - bot token (`xoxb-...`) of the **SUT** app, which must be a separate Slack app from the driver so its bot user id is distinct.
- `sutAppToken` - app-level token (`xapp-...`) of the SUT app with `connections:write`, used by Socket Mode so the SUT app can receive events.
Prefer a Slack workspace dedicated to QA over reusing a production workspace.
@@ -417,7 +417,7 @@ Go to [api.slack.com/apps](https://api.slack.com/apps) → _Create New App_ →
}
```
Copy the _Bot User OAuth Token_ (`xoxb-...`) that becomes `driverBotToken`. The driver only needs to post messages and identify itself; no events, no Socket Mode.
Copy the _Bot User OAuth Token_ (`xoxb-...`) - that becomes `driverBotToken`. The driver only needs to post messages and identify itself; no events, no Socket Mode.
**2. Create the SUT app**
@@ -504,7 +504,7 @@ In the QA workspace, create a channel (e.g. `#openclaw-qa`) and invite both bots
/invite @OpenClaw QA SUT
```
Copy the `Cxxxxxxxxxx` id from _channel info → About → Channel ID_ that becomes `channelId`. A public channel works; if you use a private channel both apps already have `groups:history` so the harness's history reads will still succeed.
Copy the `Cxxxxxxxxxx` id from _channel info → About → Channel ID_ - that becomes `channelId`. A public channel works; if you use a private channel both apps already have `groups:history` so the harness's history reads will still succeed.
**4. Register the credentials**
@@ -545,7 +545,7 @@ pnpm openclaw qa slack \
--output-dir .artifacts/qa-e2e/slack-local
```
A green run completes in well under 30 seconds and `slack-qa-report.md` shows both `slack-canary` and `slack-mention-gating` at status `pass`. If the lane hangs for ~90 seconds and exits with `Convex credential pool exhausted for kind "slack"`, either the pool is empty or every row is leased `qa credentials list --kind slack --status all --json` will tell you which.
A green run completes in well under 30 seconds and `slack-qa-report.md` shows both `slack-canary` and `slack-mention-gating` at status `pass`. If the lane hangs for ~90 seconds and exits with `Convex credential pool exhausted for kind "slack"`, either the pool is empty or every row is leased - `qa credentials list --kind slack --status all --json` will tell you which.
### Convex credential pool
@@ -553,9 +553,9 @@ Telegram, Discord, and Slack lanes can lease credentials from a shared Convex po
Payload shapes the broker validates on `admin/add`:
- Telegram (`kind: "telegram"`): `{ groupId: string, driverToken: string, sutToken: string }` `groupId` must be a numeric chat-id string.
- Telegram (`kind: "telegram"`): `{ groupId: string, driverToken: string, sutToken: string }` - `groupId` must be a numeric chat-id string.
- Discord (`kind: "discord"`): `{ guildId: string, channelId: string, driverBotToken: string, sutBotToken: string, sutApplicationId: string }`.
- Slack (`kind: "slack"`): `{ channelId: string, driverBotToken: string, sutBotToken: string, sutAppToken: string }` `channelId` must match `^[A-Z][A-Z0-9]+$` (a Slack id like `Cxxxxxxxxxx`). See [Setting up the Slack workspace](#setting-up-the-slack-workspace) for app and scope provisioning.
- Slack (`kind: "slack"`): `{ channelId: string, driverBotToken: string, sutBotToken: string, sutAppToken: string }` - `channelId` must match `^[A-Z][A-Z0-9]+$` (a Slack id like `Cxxxxxxxxxx`). See [Setting up the Slack workspace](#setting-up-the-slack-workspace) for app and scope provisioning.
Operational env vars and the Convex broker endpoint contract live in [Testing → Shared Telegram credentials via Convex](/help/testing#shared-telegram-credentials-via-convex-v1) (the section name predates Discord support; the broker semantics are identical for both kinds).
@@ -690,7 +690,7 @@ Preferred generic helpers for new scenarios:
- `formatTransportTranscript`
- `resetTransport`
Compatibility aliases remain available for existing scenarios `waitForQaChannelReady`, `waitForOutboundMessage`, `waitForNoOutbound`, `formatConversationTranscript`, `resetBus` but new scenario authoring should use the generic names. The aliases exist to avoid a flag-day migration, not as the model going forward.
Compatibility aliases remain available for existing scenarios - `waitForQaChannelReady`, `waitForOutboundMessage`, `waitForNoOutbound`, `formatConversationTranscript`, `resetBus` - but new scenario authoring should use the generic names. The aliases exist to avoid a flag-day migration, not as the model going forward.
## Reporting
@@ -702,7 +702,7 @@ The report should answer:
- What stayed blocked
- What follow-up scenarios are worth adding
For the inventory of available scenarios useful when sizing follow-up work or wiring a new transport run `pnpm openclaw qa coverage` (add `--json` for machine-readable output).
For the inventory of available scenarios - useful when sizing follow-up work or wiring a new transport - run `pnpm openclaw qa coverage` (add `--json` for machine-readable output).
For character and style checks, run the same scenario across multiple live model
refs and write a judged Markdown report:

View File

@@ -9,7 +9,7 @@ title: "Matrix QA"
The Matrix QA lane runs the bundled `@openclaw/matrix` plugin against a disposable Tuwunel homeserver in Docker, with temporary driver, SUT, and observer accounts plus seeded rooms. It is the live transport-real coverage for Matrix.
This is maintainer-only tooling. Packaged OpenClaw releases intentionally omit `qa-lab`, so `openclaw qa` is only available from a source checkout. Source checkouts load the bundled runner directly no plugin install step is needed.
This is maintainer-only tooling. Packaged OpenClaw releases intentionally omit `qa-lab`, so `openclaw qa` is only available from a source checkout. Source checkouts load the bundled runner directly - no plugin install step is needed.
For broader QA framework context, see [QA overview](/concepts/qa-e2e-automation).
@@ -24,7 +24,7 @@ Plain `pnpm openclaw qa matrix` runs `--profile all` and does not stop on first
## What the lane does
1. Provisions a disposable Tuwunel homeserver in Docker (default image `ghcr.io/matrix-construct/tuwunel:v1.5.1`, server name `matrix-qa.test`, port `28008`).
2. Registers three temporary users `driver` (sends inbound traffic), `sut` (the OpenClaw Matrix account under test), `observer` (third-party traffic capture).
2. Registers three temporary users - `driver` (sends inbound traffic), `sut` (the OpenClaw Matrix account under test), `observer` (third-party traffic capture).
3. Seeds rooms required by the selected scenarios (main, threading, media, restart, secondary, allowlist, E2EE, verification DM, etc.).
4. Starts a child OpenClaw gateway with the real Matrix plugin scoped to the SUT account; `qa-channel` is not loaded in the child.
5. Runs scenarios in sequence, observing events through the driver/observer Matrix clients.
@@ -42,7 +42,7 @@ pnpm openclaw qa matrix [options]
| --------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `--profile <profile>` | `all` | Scenario profile. See [Profiles](#profiles). |
| `--fail-fast` | off | Stop after the first failed check or scenario. |
| `--scenario <id>` | | Run only this scenario. Repeatable. See [Scenarios](#scenarios). |
| `--scenario <id>` | - | Run only this scenario. Repeatable. See [Scenarios](#scenarios). |
| `--output-dir <path>` | `<repo>/.artifacts/qa-e2e/matrix-<timestamp>` | Where reports, summary, observed events, and the output log are written. Relative paths resolve against `--repo-root`. |
| `--repo-root <path>` | `process.cwd()` | Repository root when invoking from a neutral working directory. |
| `--sut-account <id>` | `sut` | Matrix account id inside the QA gateway config. |
@@ -70,7 +70,7 @@ The selected profile decides which scenarios run.
| `fast` | Release-gate subset that exercises the live transport contract: canary, mention gating, allowlist block, reply shape, restart resume, thread follow-up, thread isolation, reaction observation, and exec approval metadata delivery. |
| `transport` | Transport-level threading, DM, room, autojoin, mention/allowlist, approval, and reaction scenarios. |
| `media` | Image, audio, video, PDF, EPUB attachment coverage. |
| `e2ee-smoke` | Minimum E2EE coverage basic encrypted reply, thread follow-up, bootstrap success. |
| `e2ee-smoke` | Minimum E2EE coverage - basic encrypted reply, thread follow-up, bootstrap success. |
| `e2ee-deep` | Exhaustive E2EE state-loss, backup, key, and recovery scenarios. |
| `e2ee-cli` | `openclaw matrix encryption setup` and `verify *` CLI scenarios driven through the QA harness. |
@@ -80,17 +80,17 @@ The exact mapping lives in `extensions/qa-matrix/src/runners/contract/scenario-c
The full scenario id list is the `MatrixQaScenarioId` union in `extensions/qa-matrix/src/runners/contract/scenario-catalog.ts:15`. Categories include:
- threading `matrix-thread-*`, `matrix-subagent-thread-spawn`
- top-level / DM / room `matrix-top-level-reply-shape`, `matrix-room-*`, `matrix-dm-*`
- streaming and tool progress `matrix-room-partial-streaming-preview`, `matrix-room-quiet-streaming-preview`, `matrix-room-tool-progress-*`, `matrix-room-block-streaming`
- media `matrix-media-type-coverage`, `matrix-room-image-understanding-attachment`, `matrix-attachment-only-ignored`, `matrix-unsupported-media-safe`
- routing `matrix-room-autojoin-invite`, `matrix-secondary-room-*`
- reactions `matrix-reaction-*`
- approvals `matrix-approval-*` (exec/plugin metadata, chunked fallback, deny reactions, threads, and `target: "both"` routing)
- restart and replay `matrix-restart-*`, `matrix-stale-sync-replay-dedupe`, `matrix-room-membership-loss`, `matrix-homeserver-restart-resume`, `matrix-initial-catchup-then-incremental`
- mention gating, bot-to-bot, and allowlists `matrix-mention-*`, `matrix-allowbots-*`, `matrix-allowlist-*`, `matrix-multi-actor-ordering`, `matrix-inbound-edit-*`, `matrix-mxid-prefixed-command-block`, `matrix-observer-allowlist-override`
- E2EE `matrix-e2ee-*` (basic reply, thread follow-up, bootstrap, recovery key lifecycle, state-loss variants, server backup behavior, device hygiene, SAS / QR / DM verification, restart, artifact redaction)
- E2EE CLI `matrix-e2ee-cli-*` (encryption setup, idempotent setup, bootstrap failure, recovery-key lifecycle, multi-account, gateway-reply round-trip, self-verification)
- threading - `matrix-thread-*`, `matrix-subagent-thread-spawn`
- top-level / DM / room - `matrix-top-level-reply-shape`, `matrix-room-*`, `matrix-dm-*`
- streaming and tool progress - `matrix-room-partial-streaming-preview`, `matrix-room-quiet-streaming-preview`, `matrix-room-tool-progress-*`, `matrix-room-block-streaming`
- media - `matrix-media-type-coverage`, `matrix-room-image-understanding-attachment`, `matrix-attachment-only-ignored`, `matrix-unsupported-media-safe`
- routing - `matrix-room-autojoin-invite`, `matrix-secondary-room-*`
- reactions - `matrix-reaction-*`
- approvals - `matrix-approval-*` (exec/plugin metadata, chunked fallback, deny reactions, threads, and `target: "both"` routing)
- restart and replay - `matrix-restart-*`, `matrix-stale-sync-replay-dedupe`, `matrix-room-membership-loss`, `matrix-homeserver-restart-resume`, `matrix-initial-catchup-then-incremental`
- mention gating, bot-to-bot, and allowlists - `matrix-mention-*`, `matrix-allowbots-*`, `matrix-allowlist-*`, `matrix-multi-actor-ordering`, `matrix-inbound-edit-*`, `matrix-mxid-prefixed-command-block`, `matrix-observer-allowlist-override`
- E2EE - `matrix-e2ee-*` (basic reply, thread follow-up, bootstrap, recovery key lifecycle, state-loss variants, server backup behavior, device hygiene, SAS / QR / DM verification, restart, artifact redaction)
- E2EE CLI - `matrix-e2ee-cli-*` (encryption setup, idempotent setup, bootstrap failure, recovery-key lifecycle, multi-account, gateway-reply round-trip, self-verification)
Pass `--scenario <id>` (repeatable) to run a hand-picked set; combine with `--profile all` to ignore profile gating.
@@ -112,10 +112,10 @@ Pass `--scenario <id>` (repeatable) to run a hand-picked set; combine with `--pr
Written to `--output-dir`:
- `matrix-qa-report.md` Markdown protocol report (what passed, failed, was skipped, and why).
- `matrix-qa-summary.json` Structured summary suitable for CI parsing and dashboards.
- `matrix-qa-observed-events.json` Observed Matrix events from the driver and observer clients. Bodies are redacted unless `OPENCLAW_QA_MATRIX_CAPTURE_CONTENT=1`; approval metadata is summarized with selected safe fields and truncated command preview.
- `matrix-qa-output.log` Combined stdout/stderr from the run. If `OPENCLAW_RUN_NODE_OUTPUT_LOG` is set, the outer launcher's log is reused instead.
- `matrix-qa-report.md` - Markdown protocol report (what passed, failed, was skipped, and why).
- `matrix-qa-summary.json` - Structured summary suitable for CI parsing and dashboards.
- `matrix-qa-observed-events.json` - Observed Matrix events from the driver and observer clients. Bodies are redacted unless `OPENCLAW_QA_MATRIX_CAPTURE_CONTENT=1`; approval metadata is summarized with selected safe fields and truncated command preview.
- `matrix-qa-output.log` - Combined stdout/stderr from the run. If `OPENCLAW_RUN_NODE_OUTPUT_LOG` is set, the outer launcher's log is reused instead.
The default output dir is `<repo>/.artifacts/qa-e2e/matrix-<timestamp>` so successive runs do not overwrite each other.
@@ -133,7 +133,7 @@ Matrix is one of three live transport lanes (Matrix, Telegram, Discord) that sha
## Related
- [QA overview](/concepts/qa-e2e-automation) overall QA stack and live transport contract
- [QA Channel](/channels/qa-channel) synthetic channel adapter for repo-backed scenarios
- [Testing](/help/testing) running tests and adding QA coverage
- [Matrix](/channels/matrix) the channel plugin under test
- [QA overview](/concepts/qa-e2e-automation) - overall QA stack and live transport contract
- [QA Channel](/channels/qa-channel) - synthetic channel adapter for repo-backed scenarios
- [Testing](/help/testing) - running tests and adding QA coverage
- [Matrix](/channels/matrix) - the channel plugin under test

View File

@@ -7,8 +7,6 @@ read_when:
- Reviewing the strict-agentic, tool-schema, elevation, and replay fixes
---
# GPT-5.5 / Codex Agentic Parity in OpenClaw
OpenClaw already worked well with tool-using frontier models, but GPT-5.5 and Codex-style models were still underperforming in a few practical ways:
- they could stop after planning instead of doing the work
@@ -25,11 +23,11 @@ This parity program fixes those gaps in four reviewable slices.
This slice adds an opt-in `strict-agentic` execution contract for embedded Pi GPT-5 runs.
When enabled, OpenClaw stops accepting plan-only turns as good enough completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task.
When enabled, OpenClaw stops accepting plan-only turns as "good enough" completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task.
This improves the GPT-5.5 experience most on:
- short ok do it follow-ups
- short "ok do it" follow-ups
- code tasks where the first step is obvious
- flows where `update_plan` should be progress tracking rather than filler text
@@ -86,21 +84,21 @@ The goal is not to make GPT-5.5 imitate Opus. The goal is to give GPT-5.5 a runt
That changes the user experience from:
- the model had a good plan but stopped
- "the model had a good plan but stopped"
to:
- the model either acted, or OpenClaw surfaced the exact reason it could not
- "the model either acted, or OpenClaw surfaced the exact reason it could not"
## Before vs after for GPT-5.5 users
| Before this program | After PR A-D |
| ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns plan only into act now or surface a blocked state |
| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns "plan only" into "act now or surface a blocked state" |
| Strict tool schemas could reject parameter-free or OpenAI/Codex-shaped tools in confusing ways | PR C makes provider-owned tool registration and invocation more predictable |
| `/elevated full` guidance could be vague or wrong in blocked runtimes | PR B gives GPT-5.5 and the user truthful runtime and permission hints |
| Replay or compaction failures could feel like the task silently disappeared | PR C surfaces paused, blocked, abandoned, and replay-invalid outcomes explicitly |
| GPT-5.5 feels worse than Opus was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate |
| "GPT-5.5 feels worse than Opus" was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate |
## Architecture
@@ -142,7 +140,7 @@ The first-wave parity pack currently covers five scenarios:
### `approval-turn-tool-followthrough`
Checks that the model does not stop at “Ill do that after a short approval. It should take the first concrete action in the same turn.
Checks that the model does not stop at "I'll do that" after a short approval. It should take the first concrete action in the same turn.
### `model-switch-tool-continuity`
@@ -210,8 +208,8 @@ Use the verdict in `qa-agentic-parity-summary.json` as the final machine-readabl
- `pass` means GPT-5.5 covered the same scenarios as Opus 4.6 and did not regress on the agreed aggregate metrics.
- `fail` means at least one hard gate tripped: weaker completion, worse unintended stops, weaker valid tool use, any fake-success case, or mismatched scenario coverage.
- shared/base CI issue is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs.
- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR Bs deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage.
- "shared/base CI issue" is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs.
- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR B's deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage.
## Who should enable `strict-agentic`
@@ -219,7 +217,7 @@ Use `strict-agentic` when:
- the agent is expected to act immediately when a next step is obvious
- GPT-5.5 or Codex-family models are the primary runtime
- you prefer explicit blocked states over helpful recap-only replies
- you prefer explicit blocked states over "helpful" recap-only replies
Keep the default contract when:

View File

@@ -18,9 +18,9 @@ of Docker runners. This doc is a "how we test" guide:
<Note>
**QA stack (qa-lab, qa-channel, live transport lanes)** is documented separately:
- [QA overview](/concepts/qa-e2e-automation) architecture, command surface, scenario authoring.
- [Matrix QA](/concepts/qa-matrix) reference for `pnpm openclaw qa matrix`.
- [QA channel](/channels/qa-channel) the synthetic transport plugin used by repo-backed scenarios.
- [QA overview](/concepts/qa-e2e-automation) - architecture, command surface, scenario authoring.
- [Matrix QA](/concepts/qa-matrix) - reference for `pnpm openclaw qa matrix`.
- [QA channel](/channels/qa-channel) - the synthetic transport plugin used by repo-backed scenarios.
This page covers running the regular test suites and Docker/Parallels runners. The QA-specific runners section below ([QA-specific runners](#qa-specific-runners)) lists the concrete `qa` invocations and points back at the references above.
</Note>
@@ -301,7 +301,7 @@ gh workflow run package-acceptance.yml --ref main \
- Starts only the local AIMock provider server for direct protocol smoke
testing.
- `pnpm openclaw qa matrix`
- Runs the Matrix live QA lane against a disposable Docker-backed Tuwunel homeserver. Source-checkout only packaged installs do not ship `qa-lab`.
- Runs the Matrix live QA lane against a disposable Docker-backed Tuwunel homeserver. Source-checkout only - packaged installs do not ship `qa-lab`.
- Full CLI, profile/scenario catalog, env vars, and artifact layout: [Matrix QA](/concepts/qa-matrix).
- `pnpm openclaw qa telegram`
- Runs the Telegram live QA lane against a real private group using the driver and SUT bot tokens from env.
@@ -399,7 +399,7 @@ The architecture and scenario-helper names for new channel adapters live in [QA
## Test suites (what runs where)
Think of the suites as increasing realism (and increasing flakiness/cost):
Think of the suites as "increasing realism" (and increasing flakiness/cost):
### Unit / integration (default)
@@ -578,12 +578,12 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
- Files: `src/**/*.live.test.ts`, `test/**/*.live.test.ts`, and bundled-plugin live tests under `extensions/`
- Default: **enabled** by `pnpm test:live` (sets `OPENCLAW_LIVE_TEST=1`)
- Scope:
- Does this provider/model actually work _today_ with real creds?
- "Does this provider/model actually work _today_ with real creds?"
- Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
- Expectations:
- Not CI-stable by design (real networks, real provider policies, quotas, outages)
- Costs money / uses rate limits
- Prefer running narrowed subsets instead of everything
- Prefer running narrowed subsets instead of "everything"
- Live runs source `~/.profile` to pick up missing API keys.
- By default, live runs still isolate `HOME` and copy config/auth material into a temp test home so unit fixtures cannot mutate your real `~/.openclaw`.
- Set `OPENCLAW_LIVE_USE_REAL_HOME=1` only when you intentionally need live tests to use your real home directory.
@@ -601,13 +601,13 @@ Use this decision table:
- Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)
- Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`
- Debugging my bot is down / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
- Debugging "my bot is down" / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
## Live (network-touching) tests
For the live model matrix, CLI backend smokes, ACP smokes, Codex app-server
harness, and all media-provider live tests (Deepgram, BytePlus, ComfyUI, image,
music, video, media harness) plus credential handling for live runs see
music, video, media harness) - plus credential handling for live runs - see
[Testing live suites](/help/testing-live). For the dedicated update and
plugin validation checklist, see
[Testing updates and plugins](/help/testing-updates-plugins).
@@ -744,19 +744,19 @@ Run full Mintlify anchor validation when you need in-page heading checks too: `p
## Offline regression (CI-safe)
These are real pipeline regressions without real providers:
These are "real pipeline" regressions without real providers:
- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.test.ts` (case: "runs a mock OpenAI tool call end-to-end via gateway agent loop")
- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.test.ts` (case: "runs wizard over ws and writes auth token config")
## Agent reliability evals (skills)
We already have a few CI-safe tests that behave like agent reliability evals:
We already have a few CI-safe tests that behave like "agent reliability evals":
- Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.test.ts`).
- End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.test.ts`).
Whats still missing for skills (see [Skills](/tools/skills)):
What's still missing for skills (see [Skills](/tools/skills)):
- **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
- **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args?
@@ -829,7 +829,7 @@ Contract tests run in CI and do not require real API keys.
When you fix a provider/model issue discovered in live:
- Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
- If its inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
- If it's inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
- Prefer targeting the smallest layer that catches the bug:
- provider request conversion/replay bug → direct models test
- gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test

View File

@@ -5,22 +5,20 @@ read_when:
title: "Menu bar"
---
# Menu Bar Status Logic
## What is shown
- We surface the current agent work state in the menu bar icon and in the first status row of the menu.
- Health status is hidden while work is active; it returns when all sessions are idle.
- A root Context submenu contains recent sessions instead of expanding them directly in the root menu.
- The Nodes block in the root menu lists **devices** only (paired nodes via `node.list`), not client/presence entries.
- A root Usage section appears below Context when provider usage snapshots are available, followed by usage-cost details when available.
- A root "Context" submenu contains recent sessions instead of expanding them directly in the root menu.
- The "Nodes" block in the root menu lists **devices** only (paired nodes via `node.list`), not client/presence entries.
- A root "Usage" section appears below Context when provider usage snapshots are available, followed by usage-cost details when available.
## State model
- Sessions: events arrive with `runId` (per-run) plus `sessionKey` in the payload. The main session is the key `main`; if absent, we fall back to the most recently updated session.
- Priority: main always wins. If main is active, its state is shown immediately. If main is idle, the most recently active nonmain session is shown. We do not flipflop midactivity; we only switch when the current session goes idle or main becomes active.
- Sessions: events arrive with `runId` (per-run) plus `sessionKey` in the payload. The "main" session is the key `main`; if absent, we fall back to the most recently updated session.
- Priority: main always wins. If main is active, its state is shown immediately. If main is idle, the most recently active non-main session is shown. We do not flip-flop mid-activity; we only switch when the current session goes idle or main becomes active.
- Activity kinds:
- `job`: highlevel command execution (`state: started|streaming|done|error`).
- `job`: high-level command execution (`state: started|streaming|done|error`).
- `tool`: `phase: start|result` with `toolName` and `meta/args`.
## IconState enum (Swift)
@@ -42,13 +40,13 @@ title: "Menu bar"
### Visual mapping
- `idle`: normal critter.
- `workingMain`: badge with glyph, full tint, leg working animation.
- `workingMain`: badge with glyph, full tint, leg "working" animation.
- `workingOther`: badge with glyph, muted tint, no scurry.
- `overridden`: uses the chosen glyph/tint regardless of activity.
## Context submenu
- The root menu shows one Context row with a session count/status and opens a submenu.
- The root menu shows one "Context" row with a session count/status and opens a submenu.
- The Context submenu header shows the active session count for the last 24 hours.
- Each session row keeps its token bar, age, preview, thinking/verbose, reset, compact, and delete actions.
- Loading, disconnected, and session-load error messages appear inside the Context submenu.
@@ -62,7 +60,7 @@ title: "Menu bar"
## Event ingestion
- Source: controlchannel `agent` events (`ControlChannel.handleAgentEvent`).
- Source: control-channel `agent` events (`ControlChannel.handleAgentEvent`).
- Parsed fields:
- `stream: "job"` with `data.state` for start/stop.
- `stream: "tool"` with `data.phase`, `name`, optional `meta`/`args`.
@@ -74,7 +72,7 @@ title: "Menu bar"
## Debug override
- Settings ▸ Debug ▸ Icon override picker:
- Settings ▸ Debug ▸ "Icon override" picker:
- `System (auto)` (default)
- `Working: main` (per tool kind)
- `Working: other` (per tool kind)
@@ -84,7 +82,7 @@ title: "Menu bar"
## Testing checklist
- Trigger main session job: verify icon switches immediately and status row shows main label.
- Trigger nonmain session job while main idle: icon/status shows nonmain; stays stable until it finishes.
- Trigger non-main session job while main idle: icon/status shows non-main; stays stable until it finishes.
- Start main while other active: icon flips to main instantly.
- Rapid tool bursts: ensure badge does not flicker (TTL grace on tool results).
- Health row reappears once all sessions idle.

View File

@@ -23,13 +23,13 @@ pairing, reply threading, and outbound messaging.
Channel plugins do not need their own send/edit/react tools. OpenClaw keeps one
shared `message` tool in core. Your plugin owns:
- **Config** account resolution and setup wizard
- **Security** DM policy and allowlists
- **Pairing** DM approval flow
- **Session grammar** how provider-specific conversation ids map to base chats, thread ids, and parent fallbacks
- **Outbound** sending text, media, and polls to the platform
- **Threading** how replies are threaded
- **Heartbeat typing** optional typing/busy signals for heartbeat delivery targets
- **Config** - account resolution and setup wizard
- **Security** - DM policy and allowlists
- **Pairing** - DM approval flow
- **Session grammar** - how provider-specific conversation ids map to base chats, thread ids, and parent fallbacks
- **Outbound** - sending text, media, and polls to the platform
- **Threading** - how replies are threaded
- **Heartbeat typing** - optional typing/busy signals for heartbeat delivery targets
Core owns the shared message tool, prompt wiring, the outer session-key shape,
generic `:thread:` bookkeeping, and dispatch.
@@ -145,11 +145,11 @@ Most channel plugins do not need approval-specific code.
- If a channel needs native approval delivery, keep channel code focused on target normalization plus transport/presentation facts. Use `createChannelExecApprovalProfile`, `createChannelNativeOriginTargetResolver`, `createChannelApproverDmTargetResolver`, and `createApproverRestrictedNativeApprovalCapability` from `openclaw/plugin-sdk/approval-runtime`. Put the channel-specific facts behind `approvalCapability.nativeRuntime`, ideally via `createChannelApprovalNativeRuntimeAdapter(...)` or `createLazyChannelApprovalNativeRuntimeAdapter(...)`, so core can assemble the handler and own request filtering, routing, dedupe, expiry, gateway subscription, and routed-elsewhere notices. `nativeRuntime` is split into a few smaller seams:
- `createChannelNativeOriginTargetResolver` uses the shared channel-route matcher by default for `{ to, accountId, threadId }` targets. Pass `targetsMatch` only when a channel has provider-specific equivalence rules, such as Slack timestamp prefix matching.
- Pass `normalizeTargetForMatch` to `createChannelNativeOriginTargetResolver` when the channel needs to canonicalize provider ids before the default route matcher or a custom `targetsMatch` callback runs, while preserving the original target for delivery. Use `normalizeTarget` only when the resolved delivery target itself should be canonicalized.
- `availability` whether the account is configured and whether a request should be handled
- `presentation` map the shared approval view model into pending/resolved/expired native payloads or final actions
- `transport` prepare targets plus send/update/delete native approval messages
- `interactions` optional bind/unbind/clear-action hooks for native buttons or reactions
- `observe` optional delivery diagnostics hooks
- `availability` - whether the account is configured and whether a request should be handled
- `presentation` - map the shared approval view model into pending/resolved/expired native payloads or final actions
- `transport` - prepare targets plus send/update/delete native approval messages
- `interactions` - optional bind/unbind/clear-action hooks for native buttons or reactions
- `observe` - optional delivery diagnostics hooks
- If the channel needs runtime-owned objects such as a client, token, Bolt app, or webhook receiver, register them through `openclaw/plugin-sdk/channel-runtime-context`. The generic runtime-context registry lets core bootstrap capability-driven handlers from channel startup state without adding approval-specific wrapper glue.
- Reach for the lower-level `createChannelApprovalHandler` or `createChannelNativeApprovalRuntime` only when the capability-driven seam is not expressive enough yet.
- Native approval channels must route both `accountId` and `approvalKind` through those helpers. `accountId` keeps multi-account approval policy scoped to the right bot account, and `approvalKind` keeps exec vs plugin approval behavior available to the channel without hardcoded branches in core.
@@ -424,7 +424,7 @@ should use `resolveInboundMentionDecision({ facts, policy })`.
<Step title="Build the channel plugin object">
The `ChannelPlugin` interface has many optional adapter surfaces. Start with
the minimum `id` and `setup` and add adapters as you need them.
the minimum - `id` and `setup` - and add adapters as you need them.
Create `src/channel.ts`:
@@ -631,7 +631,7 @@ should use `resolveInboundMentionDecision({ facts, policy })`.
const event = parseWebhookPayload(req);
// Your inbound handler dispatches the message to OpenClaw.
// The exact wiring depends on your platform SDK
// The exact wiring depends on your platform SDK -
// see a real example in the bundled Microsoft Teams or Google Chat plugin package.
await handleAcmeChatInbound(api, event);
@@ -742,10 +742,10 @@ surface unless you are maintaining that bundled plugin family directly.
## Next steps
- [Provider Plugins](/plugins/sdk-provider-plugins) if your plugin also provides models
- [SDK Overview](/plugins/sdk-overview) full subpath import reference
- [SDK Testing](/plugins/sdk-testing) test utilities and contract tests
- [Plugin Manifest](/plugins/manifest) full manifest schema
- [Provider Plugins](/plugins/sdk-provider-plugins) - if your plugin also provides models
- [SDK Overview](/plugins/sdk-overview) - full subpath import reference
- [SDK Testing](/plugins/sdk-testing) - test utilities and contract tests
- [Plugin Manifest](/plugins/manifest) - full manifest schema
## Related

View File

@@ -19,18 +19,18 @@ the new architecture, this guide helps you migrate.
The old plugin system provided two wide-open surfaces that let plugins import
anything they needed from a single entry point:
- **`openclaw/plugin-sdk/compat`** a single import that re-exported dozens of
- **`openclaw/plugin-sdk/compat`** - a single import that re-exported dozens of
helpers. It was introduced to keep older hook-based plugins working while the
new plugin architecture was being built.
- **`openclaw/plugin-sdk/infra-runtime`** a broad runtime helper barrel that
- **`openclaw/plugin-sdk/infra-runtime`** - a broad runtime helper barrel that
mixed system events, heartbeat state, delivery queues, fetch/proxy helpers,
file helpers, approval types, and unrelated utilities.
- **`openclaw/plugin-sdk/config-runtime`** a broad config compatibility barrel
- **`openclaw/plugin-sdk/config-runtime`** - a broad config compatibility barrel
that still carries deprecated direct load/write helpers during the migration
window.
- **`openclaw/extension-api`** a bridge that gave plugins direct access to
- **`openclaw/extension-api`** - a bridge that gave plugins direct access to
host-side helpers like the embedded agent runner.
- **`api.registerEmbeddedExtensionFactory(...)`** a removed Pi-only bundled
- **`api.registerEmbeddedExtensionFactory(...)`** - a removed Pi-only bundled
extension hook that could observe embedded-runner events such as
`tool_result`.
@@ -55,9 +55,9 @@ registration behavior.
The old approach caused problems:
- **Slow startup** importing one helper loaded dozens of unrelated modules
- **Circular dependencies** broad re-exports made it easy to create import cycles
- **Unclear API surface** no way to tell which exports were stable vs internal
- **Slow startup** - importing one helper loaded dozens of unrelated modules
- **Circular dependencies** - broad re-exports made it easy to create import cycles
- **Unclear API surface** - no way to tell which exports were stable vs internal
The modern plugin SDK fixes this: each import path (`openclaw/plugin-sdk/\<subpath\>`)
is a small, self-contained module with a clear purpose and documented contract.
@@ -679,7 +679,7 @@ canonical replacement.
`buildCommandsMessagePaginated`, `buildHelpMessage`.
**New (`openclaw/plugin-sdk/command-status`)**: same signatures, same
exports just imported from the narrower subpath. `command-auth`
exports - just imported from the narrower subpath. `command-auth`
re-exports them as compat stubs.
```typescript
@@ -698,7 +698,7 @@ canonical replacement.
`openclaw/plugin-sdk/channel-inbound` or
`openclaw/plugin-sdk/channel-mention-gating`.
**New**: `resolveInboundMentionDecision({ facts, policy })` returns a
**New**: `resolveInboundMentionDecision({ facts, policy })` - returns a
single decision object instead of two split calls.
Downstream channel plugins (Slack, Discord, Matrix, MS Teams) have already
@@ -714,7 +714,7 @@ canonical replacement.
`channelActions*` helpers in `openclaw/plugin-sdk/channel-actions` are
deprecated alongside raw "actions" channel exports. Expose capabilities
through the semantic `presentation` surface instead channel plugins
through the semantic `presentation` surface instead - channel plugins
declare what they render (cards, buttons, selects) rather than which raw
action names they accept.
@@ -756,7 +756,7 @@ canonical replacement.
| `ProviderDiscoveryResult` | `ProviderCatalogResult` |
| `ProviderPluginDiscovery` | `ProviderPluginCatalog` |
Plus the legacy `ProviderCapabilities` static bag provider plugins
Plus the legacy `ProviderCapabilities` static bag - provider plugins
should use explicit provider hooks such as `buildReplayPolicy`,
`normalizeToolSchemas`, and `wrapStreamFn` rather than a static object.
@@ -809,12 +809,12 @@ canonical replacement.
</Accordion>
<Accordion title="Memory plugin registration → registerMemoryCapability">
**Old**: three separate calls
**Old**: three separate calls -
`api.registerMemoryPromptSection(...)`,
`api.registerMemoryFlushPlan(...)`,
`api.registerMemoryRuntime(...)`.
**New**: one call on the memory-state API
**New**: one call on the memory-state API -
`registerMemoryCapability(pluginId, { promptBuilder, flushPlanResolver, runtime })`.
Same slots, single registration call. Additive memory helpers
@@ -906,9 +906,9 @@ This is a temporary escape hatch, not a permanent solution.
## Related
- [Getting Started](/plugins/building-plugins) build your first plugin
- [SDK Overview](/plugins/sdk-overview) full subpath import reference
- [Channel Plugins](/plugins/sdk-channel-plugins) building channel plugins
- [Provider Plugins](/plugins/sdk-provider-plugins) building provider plugins
- [Plugin Internals](/plugins/architecture) architecture deep dive
- [Plugin Manifest](/plugins/manifest) manifest schema reference
- [Getting Started](/plugins/building-plugins) - build your first plugin
- [SDK Overview](/plugins/sdk-overview) - full subpath import reference
- [Channel Plugins](/plugins/sdk-channel-plugins) - building channel plugins
- [Provider Plugins](/plugins/sdk-provider-plugins) - building provider plugins
- [Plugin Internals](/plugins/architecture) - architecture deep dive
- [Plugin Manifest](/plugins/manifest) - manifest schema reference

View File

@@ -77,7 +77,7 @@ an unavailable backend.
- The target id is allowed by `acp.allowedAgents` when that allowlist is set.
- The harness command can start on the Gateway host.
- Provider auth is present for that harness (`claude`, `codex`, `gemini`, `opencode`, `droid`, etc.).
- The selected model exists for that harness model ids are not portable across harnesses.
- The selected model exists for that harness - model ids are not portable across harnesses.
- The requested `cwd` exists and is accessible, or omit `cwd` and let the backend use its default.
- Permission mode matches the work. Non-interactive sessions cannot click native permission prompts, so write/exec-heavy coding runs usually need an ACPX permission profile that can proceed headlessly.
@@ -86,7 +86,7 @@ an unavailable backend.
OpenClaw plugin tools and built-in OpenClaw tools are **not** exposed to
ACP harnesses by default. Enable the explicit MCP bridges in
[ACP agents setup](/tools/acp-agents-setup) only when the harness
[ACP agents - setup](/tools/acp-agents-setup) only when the harness
should call those tools directly.
## Supported harness targets
@@ -182,10 +182,10 @@ Quick `/acp` flow from chat:
</Accordion>
<Accordion title="Model / provider / runtime selection cheat sheet">
- `openai-codex/*` PI Codex OAuth/subscription route.
- `openai/*` plus `agentRuntime.id: "codex"` native Codex app-server embedded runtime.
- `/codex ...` native Codex conversation control.
- `/acp ...` or `runtime: "acp"` explicit ACP/acpx control.
- `openai-codex/*` - PI Codex OAuth/subscription route.
- `openai/*` plus `agentRuntime.id: "codex"` - native Codex app-server embedded runtime.
- `/codex ...` - native Codex conversation control.
- `/acp ...` or `runtime: "acp"` - explicit ACP/acpx control.
</Accordion>
<Accordion title="ACP-routing natural-language triggers">
@@ -244,7 +244,7 @@ For Claude Code through ACP, the stack is:
ACP Claude is a **harness session** with ACP controls, session resume,
background-task tracking, and optional conversation/thread binding.
CLI backends are separate text-only local fallback runtimes see
CLI backends are separate text-only local fallback runtimes - see
[CLI Backends](/gateway/cli-backends).
For operators, the practical rule is:
@@ -256,15 +256,15 @@ For operators, the practical rule is:
### Mental model
- **Chat surface** where people keep talking (Discord channel, Telegram topic, iMessage chat).
- **ACP session** the durable Codex/Claude/Gemini runtime state OpenClaw routes to.
- **Child thread/topic** an optional extra messaging surface created only by `--thread ...`.
- **Runtime workspace** the filesystem location (`cwd`, repo checkout, backend workspace) where the harness runs. Independent of the chat surface.
- **Chat surface** - where people keep talking (Discord channel, Telegram topic, iMessage chat).
- **ACP session** - the durable Codex/Claude/Gemini runtime state OpenClaw routes to.
- **Child thread/topic** - an optional extra messaging surface created only by `--thread ...`.
- **Runtime workspace** - the filesystem location (`cwd`, repo checkout, backend workspace) where the harness runs. Independent of the chat surface.
### Current-conversation binds
`/acp spawn <harness> --bind here` pins the current conversation to the
spawned ACP session no child thread, same chat surface. OpenClaw keeps
spawned ACP session - no child thread, same chat surface. OpenClaw keeps
owning transport, auth, safety, and delivery. Follow-up messages in that
conversation route to the same session; `/new` and `/reset` reset the
session in place; `/acp close` removes the binding.
@@ -284,9 +284,9 @@ Examples:
<Accordion title="Binding rules and exclusivity">
- `--bind here` and `--thread ...` are mutually exclusive.
- `--bind here` only works on channels that advertise current-conversation binding; OpenClaw returns a clear unsupported message otherwise. Bindings persist across gateway restarts.
- On Discord, `spawnSessions` gates child thread creation for `--thread auto|here` not `--bind here`.
- On Discord, `spawnSessions` gates child thread creation for `--thread auto|here` - not `--bind here`.
- If you spawn to a different ACP agent without `--cwd`, OpenClaw inherits the **target agent's** workspace by default. Missing inherited paths (`ENOENT`/`ENOTDIR`) fall back to the backend default; other access errors (e.g. `EACCES`) surface as spawn errors.
- Gateway management commands stay local in bound conversations `/acp ...` commands are handled by OpenClaw even when normal follow-up text routes to the bound ACP session; `/status` and `/unfocus` also stay local whenever command handling is enabled for that surface.
- Gateway management commands stay local in bound conversations - `/acp ...` commands are handled by OpenClaw even when normal follow-up text routes to the bound ACP session; `/status` and `/unfocus` also stay local whenever command handling is enabled for that surface.
</Accordion>
<Accordion title="Thread-bound sessions">
@@ -676,7 +676,7 @@ background work. The delivery path depends on that shape.
```json
{
"task": "Continue where we left off fix the remaining test failures",
"task": "Continue where we left off - fix the remaining test failures",
"runtime": "acp",
"agentId": "codex",
"resumeSessionId": "<previous-session-id>"
@@ -685,7 +685,7 @@ background work. The delivery path depends on that shape.
Common use cases:
- Hand off a Codex session from your laptop to your phone tell your agent to pick up where you left off.
- Hand off a Codex session from your laptop to your phone - tell your agent to pick up where you left off.
- Continue a coding session you started interactively in the CLI, now headlessly through your agent.
- Pick up work that was interrupted by a gateway restart or idle timeout.
@@ -696,7 +696,7 @@ background work. The delivery path depends on that shape.
- `resumeSessionId` is a host-local ACP/harness resume id, not an OpenClaw channel session key; OpenClaw still checks ACP spawn policy and target agent policy before dispatch, while the ACP backend or harness owns authorization for loading that upstream id.
- `resumeSessionId` restores the upstream ACP conversation history; `thread` and `mode` still apply normally to the new OpenClaw session you are creating, so `mode: "session"` still requires `thread: true`.
- The target agent must support `session/load` (Codex and Claude Code do).
- If the session id is not found, the spawn fails with a clear error no silent fallback to a new session.
- If the session id is not found, the spawn fails with a clear error - no silent fallback to a new session.
</Accordion>
<Accordion title="Post-deploy smoke test">
@@ -709,7 +709,7 @@ background work. The delivery path depends on that shape.
4. Verify `accepted=yes`, a real `childSessionKey`, and no validator error.
5. Clean up the temporary bridge session.
Keep the gate on `mode: "run"` and skip `streamTo: "parent"`
Keep the gate on `mode: "run"` and skip `streamTo: "parent"` -
thread-bound `mode: "session"` and stream-relay paths are separate
richer integration passes.
@@ -793,18 +793,18 @@ operations:
| ---------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `/acp model <id>` | runtime config key `model` | For Codex ACP, OpenClaw normalizes `openai-codex/<model>` to the adapter model id and maps slash reasoning suffixes such as `openai-codex/gpt-5.4/high` to `reasoning_effort`. |
| `/acp set thinking <level>` | runtime config key `thinking` | For Codex ACP, OpenClaw sends the corresponding `reasoning_effort` where the adapter supports one. |
| `/acp permissions <profile>` | runtime config key `approval_policy` | |
| `/acp timeout <seconds>` | runtime config key `timeout` | |
| `/acp permissions <profile>` | runtime config key `approval_policy` | - |
| `/acp timeout <seconds>` | runtime config key `timeout` | - |
| `/acp cwd <path>` | runtime cwd override | Direct update. |
| `/acp set <key> <value>` | generic | `key=cwd` uses the cwd override path. |
| `/acp reset-options` | clears all runtime overrides | |
| `/acp reset-options` | clears all runtime overrides | - |
## acpx harness, plugin setup, and permissions
For acpx harness configuration (Claude Code / Codex / Gemini CLI
aliases), the plugin-tools and OpenClaw-tools MCP bridges, and ACP
permission modes, see
[ACP agents setup](/tools/acp-agents-setup).
[ACP agents - setup](/tools/acp-agents-setup).
## Troubleshooting
@@ -835,7 +835,7 @@ permission modes, see
## Related
- [ACP agents setup](/tools/acp-agents-setup)
- [ACP agents - setup](/tools/acp-agents-setup)
- [Agent send](/tools/agent-send)
- [CLI Backends](/gateway/cli-backends)
- [Codex harness](/plugins/codex-harness)

View File

@@ -1,6 +1,7 @@
import type { AgentMessage } from "@mariozechner/pi-agent-core";
import { SessionManager } from "@mariozechner/pi-coding-agent";
import type { SessionEntry } from "../../config/sessions/types.js";
import type { AgentCompactionMode } from "../../config/types.agent-defaults.js";
import type { OpenClawConfig } from "../../config/types.openclaw.js";
import { resolveContextEngine as resolveContextEngineImpl } from "../../context-engine/registry.js";
import type { ContextEngine } from "../../context-engine/types.js";
@@ -10,7 +11,10 @@ import { runContextEngineMaintenance as runContextEngineMaintenanceImpl } from "
import { shouldPreemptivelyCompactBeforePrompt as shouldPreemptivelyCompactBeforePromptImpl } from "../pi-embedded-runner/run/preemptive-compaction.js";
import { resolveLiveToolResultMaxChars as resolveLiveToolResultMaxCharsImpl } from "../pi-embedded-runner/tool-result-truncation.js";
import { createPreparedEmbeddedPiSettingsManager as createPreparedEmbeddedPiSettingsManagerImpl } from "../pi-project-settings.js";
import { applyPiAutoCompactionGuard as applyPiAutoCompactionGuardImpl } from "../pi-settings.js";
import {
applyPiAutoCompactionGuard as applyPiAutoCompactionGuardImpl,
resolveEffectiveCompactionMode,
} from "../pi-settings.js";
import type { SkillSnapshot } from "../skills.js";
import { recordCliCompactionInStore as recordCliCompactionInStoreImpl } from "./session-store.js";
@@ -38,6 +42,7 @@ type CliCompactionDeps = {
applyPiAutoCompactionGuard: (params: {
settingsManager: SettingsManagerLike;
contextEngineInfo?: ContextEngine["info"];
compactionMode?: AgentCompactionMode;
}) => unknown;
shouldPreemptivelyCompactBeforePrompt: typeof shouldPreemptivelyCompactBeforePromptImpl;
resolveLiveToolResultMaxChars: typeof resolveLiveToolResultMaxCharsImpl;
@@ -207,6 +212,7 @@ export async function runCliTurnCompactionLifecycle(params: {
await cliCompactionDeps.applyPiAutoCompactionGuard({
settingsManager,
contextEngineInfo: contextEngine.info,
compactionMode: resolveEffectiveCompactionMode(params.cfg),
});
const preemptiveCompaction = cliCompactionDeps.shouldPreemptivelyCompactBeforePrompt({

View File

@@ -12,7 +12,7 @@ import contextPruningExtension from "../pi-hooks/context-pruning.js";
import { setContextPruningRuntime } from "../pi-hooks/context-pruning/runtime.js";
import { computeEffectiveSettings } from "../pi-hooks/context-pruning/settings.js";
import { makeToolPrunablePredicate } from "../pi-hooks/context-pruning/tools.js";
import { ensurePiCompactionReserveTokens } from "../pi-settings.js";
import { ensurePiCompactionReserveTokens, resolveEffectiveCompactionMode } from "../pi-settings.js";
import { resolveTranscriptPolicy } from "../transcript-policy.js";
import { isCacheTtlEligibleProvider, readLastCacheTtlTimestamp } from "./cache-ttl.js";
@@ -123,15 +123,6 @@ function buildContextPruningFactory(params: {
return contextPruningExtension;
}
function resolveCompactionMode(cfg?: OpenClawConfig): "default" | "safeguard" {
const compaction = cfg?.agents?.defaults?.compaction;
// A registered compaction provider requires the safeguard extension path
if (compaction?.provider) {
return "safeguard";
}
return compaction?.mode === "safeguard" ? "safeguard" : "default";
}
export function buildEmbeddedExtensionFactories(params: {
cfg: OpenClawConfig | undefined;
sessionManager: SessionManager;
@@ -140,7 +131,7 @@ export function buildEmbeddedExtensionFactories(params: {
model: ProviderRuntimeModel | undefined;
}): ExtensionFactory[] {
const factories: ExtensionFactory[] = [];
if (resolveCompactionMode(params.cfg) === "safeguard") {
if (resolveEffectiveCompactionMode(params.cfg) === "safeguard") {
const compactionCfg = params.cfg?.agents?.defaults?.compaction;
const qualityGuardCfg = compactionCfg?.qualityGuard;
const contextWindowInfo = resolveContextWindowInfo({

View File

@@ -350,6 +350,7 @@ vi.mock("../../pi-settings.js", () => ({
},
}),
isSilentOverflowProneModel: () => false,
resolveEffectiveCompactionMode: () => "default",
}));
vi.mock("../extensions.js", () => ({

View File

@@ -107,6 +107,7 @@ import {
applyPiAutoCompactionGuard,
applyPiCompactionSettingsFromConfig,
isSilentOverflowProneModel,
resolveEffectiveCompactionMode,
} from "../../pi-settings.js";
import {
createClientToolNameConflictError,
@@ -1453,6 +1454,7 @@ export async function runEmbeddedAttempt(
const piAutoCompactionGuardArgs = {
settingsManager,
contextEngineInfo: activeContextEngine?.info,
compactionMode: resolveEffectiveCompactionMode(params.config),
silentOverflowProneProvider: isSilentOverflowProneModel({
provider: params.provider,
modelId: params.modelId,

View File

@@ -5,7 +5,9 @@ import {
applyPiCompactionSettingsFromConfig,
DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR,
isSilentOverflowProneModel,
resolveEffectiveCompactionMode,
resolveCompactionReserveTokensFloor,
shouldDisablePiAutoCompaction,
} from "./pi-settings.js";
describe("applyPiCompactionSettingsFromConfig", () => {
@@ -347,6 +349,40 @@ describe("resolveCompactionReserveTokensFloor", () => {
).toBe(0);
});
});
describe("resolveEffectiveCompactionMode", () => {
it("defaults to default compaction mode", () => {
expect(resolveEffectiveCompactionMode()).toBe("default");
expect(resolveEffectiveCompactionMode({ agents: { defaults: { compaction: {} } } })).toBe(
"default",
);
expect(
resolveEffectiveCompactionMode({
agents: { defaults: { compaction: { mode: "default" } } },
}),
).toBe("default");
});
it("returns safeguard for explicit safeguard mode", () => {
expect(
resolveEffectiveCompactionMode({
agents: { defaults: { compaction: { mode: "safeguard" } } },
}),
).toBe("safeguard");
});
it("returns safeguard when a compaction provider is configured", () => {
expect(
resolveEffectiveCompactionMode({
agents: { defaults: { compaction: { provider: "deepseek" } } },
}),
).toBe("safeguard");
expect(
resolveEffectiveCompactionMode({
agents: { defaults: { compaction: { mode: "default", provider: "deepseek" } } },
}),
).toBe("safeguard");
});
});
describe("isSilentOverflowProneModel", () => {
// Reporter's repro shape: openrouter routing to z-ai/glm. Both the bare
@@ -432,6 +468,36 @@ describe("isSilentOverflowProneModel", () => {
});
});
describe("shouldDisablePiAutoCompaction", () => {
it("returns false with no owner, default mode, and ordinary provider behavior", () => {
expect(shouldDisablePiAutoCompaction({})).toBe(false);
expect(shouldDisablePiAutoCompaction({ compactionMode: "default" })).toBe(false);
expect(
shouldDisablePiAutoCompaction({
contextEngineInfo: { id: "legacy", name: "Legacy", ownsCompaction: false },
compactionMode: "default",
silentOverflowProneProvider: false,
}),
).toBe(false);
});
it("returns true when a context engine owns compaction", () => {
expect(
shouldDisablePiAutoCompaction({
contextEngineInfo: { id: "third-party", name: "Third-party", ownsCompaction: true },
}),
).toBe(true);
});
it("returns true when effective compaction mode is safeguard", () => {
expect(shouldDisablePiAutoCompaction({ compactionMode: "safeguard" })).toBe(true);
});
it("returns true for silent-overflow-prone providers", () => {
expect(shouldDisablePiAutoCompaction({ silentOverflowProneProvider: true })).toBe(true);
});
});
describe("applyPiAutoCompactionGuard", () => {
// Direct repro of openclaw#75799: pi-ai's silent-overflow detection misfires
// on a successful turn against z.ai-style providers, triggering Pi's
@@ -481,6 +547,26 @@ describe("applyPiAutoCompactionGuard", () => {
expect(setCompactionEnabled).toHaveBeenCalledWith(false);
});
it("disables Pi auto-compaction when provider config forces safeguard mode", () => {
const setCompactionEnabled = vi.fn();
const settingsManager = {
getCompactionReserveTokens: () => 20_000,
getCompactionKeepRecentTokens: () => 4_000,
applyOverrides: () => {},
setCompactionEnabled,
};
const result = applyPiAutoCompactionGuard({
settingsManager,
compactionMode: resolveEffectiveCompactionMode({
agents: { defaults: { compaction: { provider: "deepseek" } } },
}),
});
expect(result).toEqual({ supported: true, disabled: true });
expect(setCompactionEnabled).toHaveBeenCalledWith(false);
});
// Default-mode runs against ordinary providers must keep Pi's auto-compaction
// enabled. Disabling it across the board would silently remove Pi's
// overflow-recovery path inside Session.prompt() for users who are not

View File

@@ -1,3 +1,4 @@
import type { AgentCompactionMode } from "../config/types.agent-defaults.js";
import type { OpenClawConfig } from "../config/types.openclaw.js";
import type { ContextEngineInfo } from "../context-engine/types.js";
import { MIN_PROMPT_BUDGET_RATIO, MIN_PROMPT_BUDGET_TOKENS } from "./pi-compaction-constants.js";
@@ -124,6 +125,15 @@ export function applyPiCompactionSettingsFromConfig(params: {
};
}
/** Resolve the compaction mode after provider-backed safeguard promotion. */
export function resolveEffectiveCompactionMode(cfg?: OpenClawConfig): AgentCompactionMode {
const compaction = cfg?.agents?.defaults?.compaction;
if (compaction?.provider) {
return "safeguard";
}
return compaction?.mode === "safeguard" ? "safeguard" : "default";
}
/**
* Detect providers whose pi-ai `isContextOverflow` Case 2 (silent overflow)
* fires on a successful turn and triggers Pi's `_runAutoCompaction` from
@@ -171,16 +181,20 @@ export function isSilentOverflowProneModel(model: {
* Disable Pi's `_checkCompaction → _runAutoCompaction` (which would otherwise
* fire from inside `Session.prompt()` and reassign `agent.state.messages`
* before the provider call) when OpenClaw or a plugin owns compaction:
* `contextEngineInfo.ownsCompaction === true`, or the active model is
* silent-overflow-prone (openclaw#75799). Default-mode runs against ordinary
* providers keep Pi's auto-compaction as the existing baseline.
* `contextEngineInfo.ownsCompaction === true`, effective safeguard compaction,
* or an active model that is silent-overflow-prone (openclaw#75799).
* Default-mode runs against ordinary providers keep Pi's auto-compaction as
* the existing baseline.
*/
function shouldDisablePiAutoCompaction(params: {
export function shouldDisablePiAutoCompaction(params: {
contextEngineInfo?: ContextEngineInfo;
compactionMode?: AgentCompactionMode;
silentOverflowProneProvider?: boolean;
}): boolean {
return (
params.contextEngineInfo?.ownsCompaction === true || params.silentOverflowProneProvider === true
params.contextEngineInfo?.ownsCompaction === true ||
params.compactionMode === "safeguard" ||
params.silentOverflowProneProvider === true
);
}
@@ -194,10 +208,12 @@ function shouldDisablePiAutoCompaction(params: {
export function applyPiAutoCompactionGuard(params: {
settingsManager: PiSettingsManagerLike;
contextEngineInfo?: ContextEngineInfo;
compactionMode?: AgentCompactionMode;
silentOverflowProneProvider?: boolean;
}): { supported: boolean; disabled: boolean } {
const disable = shouldDisablePiAutoCompaction({
contextEngineInfo: params.contextEngineInfo,
compactionMode: params.compactionMode,
silentOverflowProneProvider: params.silentOverflowProneProvider,
});
const hasMethod = typeof params.settingsManager.setCompactionEnabled === "function";

View File

@@ -1,5 +1,4 @@
import path from "node:path";
import { beforeEach, describe, expect, it, vi } from "vitest";
import { describe, expect, it, vi } from "vitest";
import {
HEARTBEAT_SKIP_CRON_IN_PROGRESS,
HEARTBEAT_SKIP_REQUESTS_IN_FLIGHT,
@@ -7,184 +6,17 @@ import {
} from "../infra/heartbeat-wake.js";
import type { CronEvent, CronServiceDeps } from "./service.js";
import { CronService } from "./service.js";
import { createDeferred, createNoopLogger, installCronTestHooks } from "./service.test-harness.js";
import {
createCronStoreHarness,
createDeferred,
createNoopLogger,
installCronTestHooks,
} from "./service.test-harness.js";
const noopLogger = createNoopLogger();
installCronTestHooks({ logger: noopLogger });
type FakeFsEntry =
| { kind: "file"; content: string; mtimeMs: number }
| { kind: "dir"; mtimeMs: number };
const fsState = vi.hoisted(() => ({
entries: new Map<string, FakeFsEntry>(),
nowMs: 0,
fixtureCount: 0,
}));
const abs = (p: string) => path.resolve(p);
const fixturesRoot = abs(path.join("__openclaw_vitest__", "cron", "runs-one-shot"));
const isFixturePath = (p: string) => {
const resolved = abs(p);
const rootPrefix = `${fixturesRoot}${path.sep}`;
return resolved === fixturesRoot || resolved.startsWith(rootPrefix);
};
function bumpMtimeMs() {
fsState.nowMs += 1;
return fsState.nowMs;
}
function ensureDir(dirPath: string) {
let current = abs(dirPath);
while (true) {
if (!fsState.entries.has(current)) {
fsState.entries.set(current, { kind: "dir", mtimeMs: bumpMtimeMs() });
}
const parent = path.dirname(current);
if (parent === current) {
break;
}
current = parent;
}
}
function setFile(filePath: string, content: string) {
const resolved = abs(filePath);
ensureDir(path.dirname(resolved));
fsState.entries.set(resolved, { kind: "file", content, mtimeMs: bumpMtimeMs() });
}
async function makeStorePath() {
const dir = path.join(fixturesRoot, `case-${fsState.fixtureCount++}`);
ensureDir(dir);
const storePath = path.join(dir, "cron", "jobs.json");
ensureDir(path.dirname(storePath));
return { storePath, cleanup: async () => {} };
}
vi.mock("node:fs", async () => {
const actual = await vi.importActual<typeof import("node:fs")>("node:fs");
const pathMod = await import("node:path");
const absInMock = (p: string) => pathMod.resolve(p);
const isFixtureInMock = (p: string) => {
const resolved = absInMock(p);
const rootPrefix = `${absInMock(fixturesRoot)}${pathMod.sep}`;
return resolved === absInMock(fixturesRoot) || resolved.startsWith(rootPrefix);
};
const mkErr = (code: string, message: string) => Object.assign(new Error(message), { code });
const promises = {
...actual.promises,
mkdir: async (p: string) => {
if (!isFixtureInMock(p)) {
return await actual.promises.mkdir(p, { recursive: true });
}
ensureDir(p);
return undefined;
},
readFile: async (p: string) => {
if (!isFixtureInMock(p)) {
return await actual.promises.readFile(p, "utf-8");
}
const entry = fsState.entries.get(absInMock(p));
if (!entry || entry.kind !== "file") {
throw mkErr("ENOENT", `ENOENT: no such file or directory, open '${p}'`);
}
return entry.content;
},
writeFile: async (p: string, data: string | Uint8Array) => {
if (!isFixtureInMock(p)) {
return await actual.promises.writeFile(p, data, "utf-8");
}
const content = typeof data === "string" ? data : Buffer.from(data).toString("utf-8");
setFile(p, content);
},
rename: async (from: string, to: string) => {
if (!isFixtureInMock(from) || !isFixtureInMock(to)) {
return await actual.promises.rename(from, to);
}
const fromAbs = absInMock(from);
const toAbs = absInMock(to);
const entry = fsState.entries.get(fromAbs);
if (!entry || entry.kind !== "file") {
throw mkErr("ENOENT", `ENOENT: no such file or directory, rename '${from}' -> '${to}'`);
}
ensureDir(pathMod.dirname(toAbs));
fsState.entries.delete(fromAbs);
fsState.entries.set(toAbs, { ...entry, mtimeMs: bumpMtimeMs() });
},
copyFile: async (from: string, to: string) => {
if (!isFixtureInMock(from) || !isFixtureInMock(to)) {
return await actual.promises.copyFile(from, to);
}
const entry = fsState.entries.get(absInMock(from));
if (!entry || entry.kind !== "file") {
throw mkErr("ENOENT", `ENOENT: no such file or directory, copyfile '${from}' -> '${to}'`);
}
setFile(to, entry.content);
},
stat: async (p: string) => {
if (!isFixtureInMock(p)) {
return await actual.promises.stat(p);
}
const entry = fsState.entries.get(absInMock(p));
if (!entry) {
throw mkErr("ENOENT", `ENOENT: no such file or directory, stat '${p}'`);
}
return {
mtimeMs: entry.mtimeMs,
isDirectory: () => entry.kind === "dir",
isFile: () => entry.kind === "file",
};
},
access: async (p: string) => {
if (!isFixtureInMock(p)) {
return await actual.promises.access(p);
}
const entry = fsState.entries.get(absInMock(p));
if (!entry) {
throw mkErr("ENOENT", `ENOENT: no such file or directory, access '${p}'`);
}
},
unlink: async (p: string) => {
if (!isFixtureInMock(p)) {
return await actual.promises.unlink(p);
}
fsState.entries.delete(absInMock(p));
},
} as unknown as typeof actual.promises;
const wrapped = { ...actual, promises };
return { ...wrapped, default: wrapped };
});
vi.mock("node:fs/promises", async () => {
const actual = await vi.importActual<typeof import("node:fs/promises")>("node:fs/promises");
const wrapped = {
...actual,
mkdir: async (p: string, _opts?: unknown) => {
if (!isFixturePath(p)) {
return await actual.mkdir(p, { recursive: true });
}
ensureDir(p);
return undefined;
},
writeFile: async (p: string, data: string, _enc?: unknown) => {
if (!isFixturePath(p)) {
return await actual.writeFile(p, data, "utf-8");
}
setFile(p, data);
},
};
return { ...wrapped, default: wrapped };
});
beforeEach(() => {
fsState.entries.clear();
fsState.nowMs = 0;
ensureDir(fixturesRoot);
const { makeStorePath } = createCronStoreHarness({
prefix: "openclaw-cron-runs-one-shot-",
});
function createCronEventHarness() {
@@ -229,7 +61,6 @@ type CronHarnessOptions = {
};
async function createCronHarness(options: CronHarnessOptions = {}) {
ensureDir(fixturesRoot);
const store = await makeStorePath();
const enqueueSystemEvent = vi.fn();
const requestHeartbeat = vi.fn();
@@ -377,6 +208,7 @@ function expectMainSystemEventPosted(enqueueSystemEvent: unknown, text: string)
}
async function stopCronAndCleanup(cron: CronService, store: { cleanup: () => Promise<void> }) {
await cron.status();
cron.stop();
await store.cleanup();
}
@@ -678,7 +510,6 @@ describe("CronService", () => {
});
it("rejects unsupported session/payload combinations", async () => {
ensureDir(fixturesRoot);
const store = await makeStorePath();
const cron = createStartedCronService(
@@ -712,7 +543,6 @@ describe("CronService", () => {
}),
).rejects.toThrow(/isolated.*cron jobs require/);
cron.stop();
await store.cleanup();
await stopCronAndCleanup(cron, store);
});
});

View File

@@ -211,8 +211,20 @@ describe("device pairing tokens", () => {
},
baseDir,
);
const originalTs = first.request.ts;
await new Promise((resolve) => setTimeout(resolve, 20));
const originalTs = first.request.ts - 1_000;
const paths = resolvePairingPaths(baseDir, "devices");
const pendingById = JSON.parse(await readFile(paths.pendingPath, "utf8")) as Record<
string,
{ ts: number }
>;
const pending = pendingById[first.request.requestId];
expect(pending).toBeDefined();
if (!pending) {
throw new Error("expected pending pairing request");
}
pending.ts = originalTs;
await writeFile(paths.pendingPath, JSON.stringify(pendingById, null, 2));
const second = await requestDevicePairing(
{
deviceId: "device-1",