From 2b8d91d9ee176dc89a2494d89fb9670168b45407 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 5 May 2026 19:34:52 -0700 Subject: [PATCH 1/5] docs: typography hygiene + 2 in-body H1 removals across 5 pages Replaced 112 typography characters (curly quotes, apostrophes, em/en dashes, non-breaking hyphens) with ASCII equivalents per docs/CLAUDE.md heading and content hygiene rules. - docs/help/gpt55-codex-agentic-parity.md: 22 chars; removed the duplicate '# GPT-5.5 / Codex Agentic Parity in OpenClaw' H1 (Mintlify renders the title from frontmatter; the in-body H1 with the slash produced a brittle anchor). - docs/platforms/mac/menu-bar.md: 21 chars; removed the duplicate '# Menu Bar Status Logic' H1. - docs/tools/acp-agents.md: 23 chars - docs/concepts/qa-matrix.md: 23 chars - docs/concepts/qa-e2e-automation.md: 23 chars --- docs/concepts/qa-e2e-automation.md | 40 ++++++++++----------- docs/concepts/qa-matrix.md | 46 ++++++++++++------------- docs/help/gpt55-codex-agentic-parity.md | 22 ++++++------ docs/platforms/mac/menu-bar.md | 24 ++++++------- docs/tools/acp-agents.md | 46 ++++++++++++------------- 5 files changed, 87 insertions(+), 91 deletions(-) diff --git a/docs/concepts/qa-e2e-automation.md b/docs/concepts/qa-e2e-automation.md index 82c90328321..4ddb7586f9c 100644 --- a/docs/concepts/qa-e2e-automation.md +++ b/docs/concepts/qa-e2e-automation.md @@ -240,7 +240,7 @@ can write back through the mounted workspace. ## Telegram, Discord, and Slack QA reference -Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, and Slack are smaller — a handful of scenarios each, no profile system, against pre-existing real channels — so their reference lives here. +Matrix has a [dedicated page](/concepts/qa-matrix) because of its scenario count and Docker-backed homeserver provisioning. Telegram, Discord, and Slack are smaller - a handful of scenarios each, no profile system, against pre-existing real channels - so their reference lives here. ### Shared CLI flags @@ -248,7 +248,7 @@ These lanes register through `extensions/qa-lab/src/live-transports/shared/live- | Flag | Default | Description | | ------------------------------------- | --------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | -| `--scenario ` | — | Run only this scenario. Repeatable. | +| `--scenario ` | - | Run only this scenario. Repeatable. | | `--output-dir ` | `/.artifacts/qa-e2e/{telegram,discord,slack}-` | Where reports/summary/observed messages and the output log are written. Relative paths resolve against `--repo-root`. | | `--repo-root ` | `process.cwd()` | Repository root when invoking from a neutral cwd. | | `--sut-account ` | `sut` | Temporary account id inside the QA gateway config. | @@ -270,7 +270,7 @@ Targets one real private Telegram group with two distinct bots (driver + SUT). T Required env when `--credential-source env`: -- `OPENCLAW_QA_TELEGRAM_GROUP_ID` — numeric chat id (string). +- `OPENCLAW_QA_TELEGRAM_GROUP_ID` - numeric chat id (string). - `OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN` - `OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN` @@ -294,8 +294,8 @@ Scenarios (`extensions/qa-lab/src/live-transports/telegram/telegram-live.runtime Output artifacts: - `telegram-qa-report.md` -- `telegram-qa-summary.json` — includes per-reply RTT (driver send → observed SUT reply) starting with the canary. -- `telegram-qa-observed-messages.json` — bodies redacted unless `OPENCLAW_QA_TELEGRAM_CAPTURE_CONTENT=1`. +- `telegram-qa-summary.json` - includes per-reply RTT (driver send → observed SUT reply) starting with the canary. +- `telegram-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_TELEGRAM_CAPTURE_CONTENT=1`. ### Discord QA @@ -311,7 +311,7 @@ Required env when `--credential-source env`: - `OPENCLAW_QA_DISCORD_CHANNEL_ID` - `OPENCLAW_QA_DISCORD_DRIVER_BOT_TOKEN` - `OPENCLAW_QA_DISCORD_SUT_BOT_TOKEN` -- `OPENCLAW_QA_DISCORD_SUT_APPLICATION_ID` — must match the SUT bot user id returned by Discord (the lane fails fast otherwise). +- `OPENCLAW_QA_DISCORD_SUT_APPLICATION_ID` - must match the SUT bot user id returned by Discord (the lane fails fast otherwise). Optional: @@ -322,7 +322,7 @@ Scenarios (`extensions/qa-lab/src/live-transports/discord/discord-live.runtime.t - `discord-canary` - `discord-mention-gating` - `discord-native-help-command-registration` -- `discord-status-reactions-tool-only` — opt-in Mantis scenario. Runs by itself because it switches the SUT to always-on, tool-only guild replies with `messages.statusReactions.enabled=true`, then captures a REST reaction timeline plus HTML/PNG visual artifacts. Mantis before/after reports also preserve scenario-provided MP4 artifacts as `baseline.mp4` and `candidate.mp4`. +- `discord-status-reactions-tool-only` - opt-in Mantis scenario. Runs by itself because it switches the SUT to always-on, tool-only guild replies with `messages.statusReactions.enabled=true`, then captures a REST reaction timeline plus HTML/PNG visual artifacts. Mantis before/after reports also preserve scenario-provided MP4 artifacts as `baseline.mp4` and `candidate.mp4`. Run the Mantis status-reaction scenario explicitly: @@ -339,7 +339,7 @@ Output artifacts: - `discord-qa-report.md` - `discord-qa-summary.json` -- `discord-qa-observed-messages.json` — bodies redacted unless `OPENCLAW_QA_DISCORD_CAPTURE_CONTENT=1`. +- `discord-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_DISCORD_CAPTURE_CONTENT=1`. - `discord-qa-reaction-timelines.json` and `discord-status-reactions-tool-only-timeline.png` when the status-reaction scenario runs. ### Slack QA @@ -375,16 +375,16 @@ Output artifacts: - `slack-qa-report.md` - `slack-qa-summary.json` -- `slack-qa-observed-messages.json` — bodies redacted unless `OPENCLAW_QA_SLACK_CAPTURE_CONTENT=1`. +- `slack-qa-observed-messages.json` - bodies redacted unless `OPENCLAW_QA_SLACK_CAPTURE_CONTENT=1`. #### Setting up the Slack workspace The lane needs two distinct Slack apps in one workspace, plus a channel both bots are members of: -- `channelId` — the `Cxxxxxxxxxx` id of a channel both bots have been invited to. Use a dedicated channel; the lane posts on every run. -- `driverBotToken` — bot token (`xoxb-...`) of the **Driver** app. -- `sutBotToken` — bot token (`xoxb-...`) of the **SUT** app, which must be a separate Slack app from the driver so its bot user id is distinct. -- `sutAppToken` — app-level token (`xapp-...`) of the SUT app with `connections:write`, used by Socket Mode so the SUT app can receive events. +- `channelId` - the `Cxxxxxxxxxx` id of a channel both bots have been invited to. Use a dedicated channel; the lane posts on every run. +- `driverBotToken` - bot token (`xoxb-...`) of the **Driver** app. +- `sutBotToken` - bot token (`xoxb-...`) of the **SUT** app, which must be a separate Slack app from the driver so its bot user id is distinct. +- `sutAppToken` - app-level token (`xapp-...`) of the SUT app with `connections:write`, used by Socket Mode so the SUT app can receive events. Prefer a Slack workspace dedicated to QA over reusing a production workspace. @@ -417,7 +417,7 @@ Go to [api.slack.com/apps](https://api.slack.com/apps) → _Create New App_ → } ``` -Copy the _Bot User OAuth Token_ (`xoxb-...`) — that becomes `driverBotToken`. The driver only needs to post messages and identify itself; no events, no Socket Mode. +Copy the _Bot User OAuth Token_ (`xoxb-...`) - that becomes `driverBotToken`. The driver only needs to post messages and identify itself; no events, no Socket Mode. **2. Create the SUT app** @@ -504,7 +504,7 @@ In the QA workspace, create a channel (e.g. `#openclaw-qa`) and invite both bots /invite @OpenClaw QA SUT ``` -Copy the `Cxxxxxxxxxx` id from _channel info → About → Channel ID_ — that becomes `channelId`. A public channel works; if you use a private channel both apps already have `groups:history` so the harness's history reads will still succeed. +Copy the `Cxxxxxxxxxx` id from _channel info → About → Channel ID_ - that becomes `channelId`. A public channel works; if you use a private channel both apps already have `groups:history` so the harness's history reads will still succeed. **4. Register the credentials** @@ -545,7 +545,7 @@ pnpm openclaw qa slack \ --output-dir .artifacts/qa-e2e/slack-local ``` -A green run completes in well under 30 seconds and `slack-qa-report.md` shows both `slack-canary` and `slack-mention-gating` at status `pass`. If the lane hangs for ~90 seconds and exits with `Convex credential pool exhausted for kind "slack"`, either the pool is empty or every row is leased — `qa credentials list --kind slack --status all --json` will tell you which. +A green run completes in well under 30 seconds and `slack-qa-report.md` shows both `slack-canary` and `slack-mention-gating` at status `pass`. If the lane hangs for ~90 seconds and exits with `Convex credential pool exhausted for kind "slack"`, either the pool is empty or every row is leased - `qa credentials list --kind slack --status all --json` will tell you which. ### Convex credential pool @@ -553,9 +553,9 @@ Telegram, Discord, and Slack lanes can lease credentials from a shared Convex po Payload shapes the broker validates on `admin/add`: -- Telegram (`kind: "telegram"`): `{ groupId: string, driverToken: string, sutToken: string }` — `groupId` must be a numeric chat-id string. +- Telegram (`kind: "telegram"`): `{ groupId: string, driverToken: string, sutToken: string }` - `groupId` must be a numeric chat-id string. - Discord (`kind: "discord"`): `{ guildId: string, channelId: string, driverBotToken: string, sutBotToken: string, sutApplicationId: string }`. -- Slack (`kind: "slack"`): `{ channelId: string, driverBotToken: string, sutBotToken: string, sutAppToken: string }` — `channelId` must match `^[A-Z][A-Z0-9]+$` (a Slack id like `Cxxxxxxxxxx`). See [Setting up the Slack workspace](#setting-up-the-slack-workspace) for app and scope provisioning. +- Slack (`kind: "slack"`): `{ channelId: string, driverBotToken: string, sutBotToken: string, sutAppToken: string }` - `channelId` must match `^[A-Z][A-Z0-9]+$` (a Slack id like `Cxxxxxxxxxx`). See [Setting up the Slack workspace](#setting-up-the-slack-workspace) for app and scope provisioning. Operational env vars and the Convex broker endpoint contract live in [Testing → Shared Telegram credentials via Convex](/help/testing#shared-telegram-credentials-via-convex-v1) (the section name predates Discord support; the broker semantics are identical for both kinds). @@ -690,7 +690,7 @@ Preferred generic helpers for new scenarios: - `formatTransportTranscript` - `resetTransport` -Compatibility aliases remain available for existing scenarios — `waitForQaChannelReady`, `waitForOutboundMessage`, `waitForNoOutbound`, `formatConversationTranscript`, `resetBus` — but new scenario authoring should use the generic names. The aliases exist to avoid a flag-day migration, not as the model going forward. +Compatibility aliases remain available for existing scenarios - `waitForQaChannelReady`, `waitForOutboundMessage`, `waitForNoOutbound`, `formatConversationTranscript`, `resetBus` - but new scenario authoring should use the generic names. The aliases exist to avoid a flag-day migration, not as the model going forward. ## Reporting @@ -702,7 +702,7 @@ The report should answer: - What stayed blocked - What follow-up scenarios are worth adding -For the inventory of available scenarios — useful when sizing follow-up work or wiring a new transport — run `pnpm openclaw qa coverage` (add `--json` for machine-readable output). +For the inventory of available scenarios - useful when sizing follow-up work or wiring a new transport - run `pnpm openclaw qa coverage` (add `--json` for machine-readable output). For character and style checks, run the same scenario across multiple live model refs and write a judged Markdown report: diff --git a/docs/concepts/qa-matrix.md b/docs/concepts/qa-matrix.md index add045f4307..606b672b9ec 100644 --- a/docs/concepts/qa-matrix.md +++ b/docs/concepts/qa-matrix.md @@ -9,7 +9,7 @@ title: "Matrix QA" The Matrix QA lane runs the bundled `@openclaw/matrix` plugin against a disposable Tuwunel homeserver in Docker, with temporary driver, SUT, and observer accounts plus seeded rooms. It is the live transport-real coverage for Matrix. -This is maintainer-only tooling. Packaged OpenClaw releases intentionally omit `qa-lab`, so `openclaw qa` is only available from a source checkout. Source checkouts load the bundled runner directly — no plugin install step is needed. +This is maintainer-only tooling. Packaged OpenClaw releases intentionally omit `qa-lab`, so `openclaw qa` is only available from a source checkout. Source checkouts load the bundled runner directly - no plugin install step is needed. For broader QA framework context, see [QA overview](/concepts/qa-e2e-automation). @@ -24,7 +24,7 @@ Plain `pnpm openclaw qa matrix` runs `--profile all` and does not stop on first ## What the lane does 1. Provisions a disposable Tuwunel homeserver in Docker (default image `ghcr.io/matrix-construct/tuwunel:v1.5.1`, server name `matrix-qa.test`, port `28008`). -2. Registers three temporary users — `driver` (sends inbound traffic), `sut` (the OpenClaw Matrix account under test), `observer` (third-party traffic capture). +2. Registers three temporary users - `driver` (sends inbound traffic), `sut` (the OpenClaw Matrix account under test), `observer` (third-party traffic capture). 3. Seeds rooms required by the selected scenarios (main, threading, media, restart, secondary, allowlist, E2EE, verification DM, etc.). 4. Starts a child OpenClaw gateway with the real Matrix plugin scoped to the SUT account; `qa-channel` is not loaded in the child. 5. Runs scenarios in sequence, observing events through the driver/observer Matrix clients. @@ -42,7 +42,7 @@ pnpm openclaw qa matrix [options] | --------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | `--profile ` | `all` | Scenario profile. See [Profiles](#profiles). | | `--fail-fast` | off | Stop after the first failed check or scenario. | -| `--scenario ` | — | Run only this scenario. Repeatable. See [Scenarios](#scenarios). | +| `--scenario ` | - | Run only this scenario. Repeatable. See [Scenarios](#scenarios). | | `--output-dir ` | `/.artifacts/qa-e2e/matrix-` | Where reports, summary, observed events, and the output log are written. Relative paths resolve against `--repo-root`. | | `--repo-root ` | `process.cwd()` | Repository root when invoking from a neutral working directory. | | `--sut-account ` | `sut` | Matrix account id inside the QA gateway config. | @@ -70,7 +70,7 @@ The selected profile decides which scenarios run. | `fast` | Release-gate subset that exercises the live transport contract: canary, mention gating, allowlist block, reply shape, restart resume, thread follow-up, thread isolation, reaction observation, and exec approval metadata delivery. | | `transport` | Transport-level threading, DM, room, autojoin, mention/allowlist, approval, and reaction scenarios. | | `media` | Image, audio, video, PDF, EPUB attachment coverage. | -| `e2ee-smoke` | Minimum E2EE coverage — basic encrypted reply, thread follow-up, bootstrap success. | +| `e2ee-smoke` | Minimum E2EE coverage - basic encrypted reply, thread follow-up, bootstrap success. | | `e2ee-deep` | Exhaustive E2EE state-loss, backup, key, and recovery scenarios. | | `e2ee-cli` | `openclaw matrix encryption setup` and `verify *` CLI scenarios driven through the QA harness. | @@ -80,17 +80,17 @@ The exact mapping lives in `extensions/qa-matrix/src/runners/contract/scenario-c The full scenario id list is the `MatrixQaScenarioId` union in `extensions/qa-matrix/src/runners/contract/scenario-catalog.ts:15`. Categories include: -- threading — `matrix-thread-*`, `matrix-subagent-thread-spawn` -- top-level / DM / room — `matrix-top-level-reply-shape`, `matrix-room-*`, `matrix-dm-*` -- streaming and tool progress — `matrix-room-partial-streaming-preview`, `matrix-room-quiet-streaming-preview`, `matrix-room-tool-progress-*`, `matrix-room-block-streaming` -- media — `matrix-media-type-coverage`, `matrix-room-image-understanding-attachment`, `matrix-attachment-only-ignored`, `matrix-unsupported-media-safe` -- routing — `matrix-room-autojoin-invite`, `matrix-secondary-room-*` -- reactions — `matrix-reaction-*` -- approvals — `matrix-approval-*` (exec/plugin metadata, chunked fallback, deny reactions, threads, and `target: "both"` routing) -- restart and replay — `matrix-restart-*`, `matrix-stale-sync-replay-dedupe`, `matrix-room-membership-loss`, `matrix-homeserver-restart-resume`, `matrix-initial-catchup-then-incremental` -- mention gating, bot-to-bot, and allowlists — `matrix-mention-*`, `matrix-allowbots-*`, `matrix-allowlist-*`, `matrix-multi-actor-ordering`, `matrix-inbound-edit-*`, `matrix-mxid-prefixed-command-block`, `matrix-observer-allowlist-override` -- E2EE — `matrix-e2ee-*` (basic reply, thread follow-up, bootstrap, recovery key lifecycle, state-loss variants, server backup behavior, device hygiene, SAS / QR / DM verification, restart, artifact redaction) -- E2EE CLI — `matrix-e2ee-cli-*` (encryption setup, idempotent setup, bootstrap failure, recovery-key lifecycle, multi-account, gateway-reply round-trip, self-verification) +- threading - `matrix-thread-*`, `matrix-subagent-thread-spawn` +- top-level / DM / room - `matrix-top-level-reply-shape`, `matrix-room-*`, `matrix-dm-*` +- streaming and tool progress - `matrix-room-partial-streaming-preview`, `matrix-room-quiet-streaming-preview`, `matrix-room-tool-progress-*`, `matrix-room-block-streaming` +- media - `matrix-media-type-coverage`, `matrix-room-image-understanding-attachment`, `matrix-attachment-only-ignored`, `matrix-unsupported-media-safe` +- routing - `matrix-room-autojoin-invite`, `matrix-secondary-room-*` +- reactions - `matrix-reaction-*` +- approvals - `matrix-approval-*` (exec/plugin metadata, chunked fallback, deny reactions, threads, and `target: "both"` routing) +- restart and replay - `matrix-restart-*`, `matrix-stale-sync-replay-dedupe`, `matrix-room-membership-loss`, `matrix-homeserver-restart-resume`, `matrix-initial-catchup-then-incremental` +- mention gating, bot-to-bot, and allowlists - `matrix-mention-*`, `matrix-allowbots-*`, `matrix-allowlist-*`, `matrix-multi-actor-ordering`, `matrix-inbound-edit-*`, `matrix-mxid-prefixed-command-block`, `matrix-observer-allowlist-override` +- E2EE - `matrix-e2ee-*` (basic reply, thread follow-up, bootstrap, recovery key lifecycle, state-loss variants, server backup behavior, device hygiene, SAS / QR / DM verification, restart, artifact redaction) +- E2EE CLI - `matrix-e2ee-cli-*` (encryption setup, idempotent setup, bootstrap failure, recovery-key lifecycle, multi-account, gateway-reply round-trip, self-verification) Pass `--scenario ` (repeatable) to run a hand-picked set; combine with `--profile all` to ignore profile gating. @@ -112,10 +112,10 @@ Pass `--scenario ` (repeatable) to run a hand-picked set; combine with `--pr Written to `--output-dir`: -- `matrix-qa-report.md` — Markdown protocol report (what passed, failed, was skipped, and why). -- `matrix-qa-summary.json` — Structured summary suitable for CI parsing and dashboards. -- `matrix-qa-observed-events.json` — Observed Matrix events from the driver and observer clients. Bodies are redacted unless `OPENCLAW_QA_MATRIX_CAPTURE_CONTENT=1`; approval metadata is summarized with selected safe fields and truncated command preview. -- `matrix-qa-output.log` — Combined stdout/stderr from the run. If `OPENCLAW_RUN_NODE_OUTPUT_LOG` is set, the outer launcher's log is reused instead. +- `matrix-qa-report.md` - Markdown protocol report (what passed, failed, was skipped, and why). +- `matrix-qa-summary.json` - Structured summary suitable for CI parsing and dashboards. +- `matrix-qa-observed-events.json` - Observed Matrix events from the driver and observer clients. Bodies are redacted unless `OPENCLAW_QA_MATRIX_CAPTURE_CONTENT=1`; approval metadata is summarized with selected safe fields and truncated command preview. +- `matrix-qa-output.log` - Combined stdout/stderr from the run. If `OPENCLAW_RUN_NODE_OUTPUT_LOG` is set, the outer launcher's log is reused instead. The default output dir is `/.artifacts/qa-e2e/matrix-` so successive runs do not overwrite each other. @@ -133,7 +133,7 @@ Matrix is one of three live transport lanes (Matrix, Telegram, Discord) that sha ## Related -- [QA overview](/concepts/qa-e2e-automation) — overall QA stack and live transport contract -- [QA Channel](/channels/qa-channel) — synthetic channel adapter for repo-backed scenarios -- [Testing](/help/testing) — running tests and adding QA coverage -- [Matrix](/channels/matrix) — the channel plugin under test +- [QA overview](/concepts/qa-e2e-automation) - overall QA stack and live transport contract +- [QA Channel](/channels/qa-channel) - synthetic channel adapter for repo-backed scenarios +- [Testing](/help/testing) - running tests and adding QA coverage +- [Matrix](/channels/matrix) - the channel plugin under test diff --git a/docs/help/gpt55-codex-agentic-parity.md b/docs/help/gpt55-codex-agentic-parity.md index 38af2ef3c5b..d833a556a23 100644 --- a/docs/help/gpt55-codex-agentic-parity.md +++ b/docs/help/gpt55-codex-agentic-parity.md @@ -7,8 +7,6 @@ read_when: - Reviewing the strict-agentic, tool-schema, elevation, and replay fixes --- -# GPT-5.5 / Codex Agentic Parity in OpenClaw - OpenClaw already worked well with tool-using frontier models, but GPT-5.5 and Codex-style models were still underperforming in a few practical ways: - they could stop after planning instead of doing the work @@ -25,11 +23,11 @@ This parity program fixes those gaps in four reviewable slices. This slice adds an opt-in `strict-agentic` execution contract for embedded Pi GPT-5 runs. -When enabled, OpenClaw stops accepting plan-only turns as “good enough” completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task. +When enabled, OpenClaw stops accepting plan-only turns as "good enough" completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task. This improves the GPT-5.5 experience most on: -- short “ok do it” follow-ups +- short "ok do it" follow-ups - code tasks where the first step is obvious - flows where `update_plan` should be progress tracking rather than filler text @@ -86,21 +84,21 @@ The goal is not to make GPT-5.5 imitate Opus. The goal is to give GPT-5.5 a runt That changes the user experience from: -- “the model had a good plan but stopped” +- "the model had a good plan but stopped" to: -- “the model either acted, or OpenClaw surfaced the exact reason it could not” +- "the model either acted, or OpenClaw surfaced the exact reason it could not" ## Before vs after for GPT-5.5 users | Before this program | After PR A-D | | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | -| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns “plan only” into “act now or surface a blocked state” | +| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns "plan only" into "act now or surface a blocked state" | | Strict tool schemas could reject parameter-free or OpenAI/Codex-shaped tools in confusing ways | PR C makes provider-owned tool registration and invocation more predictable | | `/elevated full` guidance could be vague or wrong in blocked runtimes | PR B gives GPT-5.5 and the user truthful runtime and permission hints | | Replay or compaction failures could feel like the task silently disappeared | PR C surfaces paused, blocked, abandoned, and replay-invalid outcomes explicitly | -| “GPT-5.5 feels worse than Opus” was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate | +| "GPT-5.5 feels worse than Opus" was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate | ## Architecture @@ -142,7 +140,7 @@ The first-wave parity pack currently covers five scenarios: ### `approval-turn-tool-followthrough` -Checks that the model does not stop at “I’ll do that” after a short approval. It should take the first concrete action in the same turn. +Checks that the model does not stop at "I'll do that" after a short approval. It should take the first concrete action in the same turn. ### `model-switch-tool-continuity` @@ -210,8 +208,8 @@ Use the verdict in `qa-agentic-parity-summary.json` as the final machine-readabl - `pass` means GPT-5.5 covered the same scenarios as Opus 4.6 and did not regress on the agreed aggregate metrics. - `fail` means at least one hard gate tripped: weaker completion, worse unintended stops, weaker valid tool use, any fake-success case, or mismatched scenario coverage. -- “shared/base CI issue” is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs. -- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR B’s deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage. +- "shared/base CI issue" is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs. +- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR B's deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage. ## Who should enable `strict-agentic` @@ -219,7 +217,7 @@ Use `strict-agentic` when: - the agent is expected to act immediately when a next step is obvious - GPT-5.5 or Codex-family models are the primary runtime -- you prefer explicit blocked states over “helpful” recap-only replies +- you prefer explicit blocked states over "helpful" recap-only replies Keep the default contract when: diff --git a/docs/platforms/mac/menu-bar.md b/docs/platforms/mac/menu-bar.md index 057c0f40e77..33094c0f9ad 100644 --- a/docs/platforms/mac/menu-bar.md +++ b/docs/platforms/mac/menu-bar.md @@ -5,22 +5,20 @@ read_when: title: "Menu bar" --- -# Menu Bar Status Logic - ## What is shown - We surface the current agent work state in the menu bar icon and in the first status row of the menu. - Health status is hidden while work is active; it returns when all sessions are idle. -- A root “Context” submenu contains recent sessions instead of expanding them directly in the root menu. -- The “Nodes” block in the root menu lists **devices** only (paired nodes via `node.list`), not client/presence entries. -- A root “Usage” section appears below Context when provider usage snapshots are available, followed by usage-cost details when available. +- A root "Context" submenu contains recent sessions instead of expanding them directly in the root menu. +- The "Nodes" block in the root menu lists **devices** only (paired nodes via `node.list`), not client/presence entries. +- A root "Usage" section appears below Context when provider usage snapshots are available, followed by usage-cost details when available. ## State model -- Sessions: events arrive with `runId` (per-run) plus `sessionKey` in the payload. The “main” session is the key `main`; if absent, we fall back to the most recently updated session. -- Priority: main always wins. If main is active, its state is shown immediately. If main is idle, the most recently active non‑main session is shown. We do not flip‑flop mid‑activity; we only switch when the current session goes idle or main becomes active. +- Sessions: events arrive with `runId` (per-run) plus `sessionKey` in the payload. The "main" session is the key `main`; if absent, we fall back to the most recently updated session. +- Priority: main always wins. If main is active, its state is shown immediately. If main is idle, the most recently active non-main session is shown. We do not flip-flop mid-activity; we only switch when the current session goes idle or main becomes active. - Activity kinds: - - `job`: high‑level command execution (`state: started|streaming|done|error`). + - `job`: high-level command execution (`state: started|streaming|done|error`). - `tool`: `phase: start|result` with `toolName` and `meta/args`. ## IconState enum (Swift) @@ -42,13 +40,13 @@ title: "Menu bar" ### Visual mapping - `idle`: normal critter. -- `workingMain`: badge with glyph, full tint, leg “working” animation. +- `workingMain`: badge with glyph, full tint, leg "working" animation. - `workingOther`: badge with glyph, muted tint, no scurry. - `overridden`: uses the chosen glyph/tint regardless of activity. ## Context submenu -- The root menu shows one “Context” row with a session count/status and opens a submenu. +- The root menu shows one "Context" row with a session count/status and opens a submenu. - The Context submenu header shows the active session count for the last 24 hours. - Each session row keeps its token bar, age, preview, thinking/verbose, reset, compact, and delete actions. - Loading, disconnected, and session-load error messages appear inside the Context submenu. @@ -62,7 +60,7 @@ title: "Menu bar" ## Event ingestion -- Source: control‑channel `agent` events (`ControlChannel.handleAgentEvent`). +- Source: control-channel `agent` events (`ControlChannel.handleAgentEvent`). - Parsed fields: - `stream: "job"` with `data.state` for start/stop. - `stream: "tool"` with `data.phase`, `name`, optional `meta`/`args`. @@ -74,7 +72,7 @@ title: "Menu bar" ## Debug override -- Settings ▸ Debug ▸ “Icon override” picker: +- Settings ▸ Debug ▸ "Icon override" picker: - `System (auto)` (default) - `Working: main` (per tool kind) - `Working: other` (per tool kind) @@ -84,7 +82,7 @@ title: "Menu bar" ## Testing checklist - Trigger main session job: verify icon switches immediately and status row shows main label. -- Trigger non‑main session job while main idle: icon/status shows non‑main; stays stable until it finishes. +- Trigger non-main session job while main idle: icon/status shows non-main; stays stable until it finishes. - Start main while other active: icon flips to main instantly. - Rapid tool bursts: ensure badge does not flicker (TTL grace on tool results). - Health row reappears once all sessions idle. diff --git a/docs/tools/acp-agents.md b/docs/tools/acp-agents.md index af01ba89b7b..8726b636c06 100644 --- a/docs/tools/acp-agents.md +++ b/docs/tools/acp-agents.md @@ -77,7 +77,7 @@ an unavailable backend. - The target id is allowed by `acp.allowedAgents` when that allowlist is set. - The harness command can start on the Gateway host. - Provider auth is present for that harness (`claude`, `codex`, `gemini`, `opencode`, `droid`, etc.). - - The selected model exists for that harness — model ids are not portable across harnesses. + - The selected model exists for that harness - model ids are not portable across harnesses. - The requested `cwd` exists and is accessible, or omit `cwd` and let the backend use its default. - Permission mode matches the work. Non-interactive sessions cannot click native permission prompts, so write/exec-heavy coding runs usually need an ACPX permission profile that can proceed headlessly. @@ -86,7 +86,7 @@ an unavailable backend. OpenClaw plugin tools and built-in OpenClaw tools are **not** exposed to ACP harnesses by default. Enable the explicit MCP bridges in -[ACP agents — setup](/tools/acp-agents-setup) only when the harness +[ACP agents - setup](/tools/acp-agents-setup) only when the harness should call those tools directly. ## Supported harness targets @@ -182,10 +182,10 @@ Quick `/acp` flow from chat: - - `openai-codex/*` — PI Codex OAuth/subscription route. - - `openai/*` plus `agentRuntime.id: "codex"` — native Codex app-server embedded runtime. - - `/codex ...` — native Codex conversation control. - - `/acp ...` or `runtime: "acp"` — explicit ACP/acpx control. + - `openai-codex/*` - PI Codex OAuth/subscription route. + - `openai/*` plus `agentRuntime.id: "codex"` - native Codex app-server embedded runtime. + - `/codex ...` - native Codex conversation control. + - `/acp ...` or `runtime: "acp"` - explicit ACP/acpx control. @@ -244,7 +244,7 @@ For Claude Code through ACP, the stack is: ACP Claude is a **harness session** with ACP controls, session resume, background-task tracking, and optional conversation/thread binding. -CLI backends are separate text-only local fallback runtimes — see +CLI backends are separate text-only local fallback runtimes - see [CLI Backends](/gateway/cli-backends). For operators, the practical rule is: @@ -256,15 +256,15 @@ For operators, the practical rule is: ### Mental model -- **Chat surface** — where people keep talking (Discord channel, Telegram topic, iMessage chat). -- **ACP session** — the durable Codex/Claude/Gemini runtime state OpenClaw routes to. -- **Child thread/topic** — an optional extra messaging surface created only by `--thread ...`. -- **Runtime workspace** — the filesystem location (`cwd`, repo checkout, backend workspace) where the harness runs. Independent of the chat surface. +- **Chat surface** - where people keep talking (Discord channel, Telegram topic, iMessage chat). +- **ACP session** - the durable Codex/Claude/Gemini runtime state OpenClaw routes to. +- **Child thread/topic** - an optional extra messaging surface created only by `--thread ...`. +- **Runtime workspace** - the filesystem location (`cwd`, repo checkout, backend workspace) where the harness runs. Independent of the chat surface. ### Current-conversation binds `/acp spawn --bind here` pins the current conversation to the -spawned ACP session — no child thread, same chat surface. OpenClaw keeps +spawned ACP session - no child thread, same chat surface. OpenClaw keeps owning transport, auth, safety, and delivery. Follow-up messages in that conversation route to the same session; `/new` and `/reset` reset the session in place; `/acp close` removes the binding. @@ -284,9 +284,9 @@ Examples: - `--bind here` and `--thread ...` are mutually exclusive. - `--bind here` only works on channels that advertise current-conversation binding; OpenClaw returns a clear unsupported message otherwise. Bindings persist across gateway restarts. - - On Discord, `spawnSessions` gates child thread creation for `--thread auto|here` — not `--bind here`. + - On Discord, `spawnSessions` gates child thread creation for `--thread auto|here` - not `--bind here`. - If you spawn to a different ACP agent without `--cwd`, OpenClaw inherits the **target agent's** workspace by default. Missing inherited paths (`ENOENT`/`ENOTDIR`) fall back to the backend default; other access errors (e.g. `EACCES`) surface as spawn errors. - - Gateway management commands stay local in bound conversations — `/acp ...` commands are handled by OpenClaw even when normal follow-up text routes to the bound ACP session; `/status` and `/unfocus` also stay local whenever command handling is enabled for that surface. + - Gateway management commands stay local in bound conversations - `/acp ...` commands are handled by OpenClaw even when normal follow-up text routes to the bound ACP session; `/status` and `/unfocus` also stay local whenever command handling is enabled for that surface. @@ -676,7 +676,7 @@ background work. The delivery path depends on that shape. ```json { - "task": "Continue where we left off — fix the remaining test failures", + "task": "Continue where we left off - fix the remaining test failures", "runtime": "acp", "agentId": "codex", "resumeSessionId": "" @@ -685,7 +685,7 @@ background work. The delivery path depends on that shape. Common use cases: - - Hand off a Codex session from your laptop to your phone — tell your agent to pick up where you left off. + - Hand off a Codex session from your laptop to your phone - tell your agent to pick up where you left off. - Continue a coding session you started interactively in the CLI, now headlessly through your agent. - Pick up work that was interrupted by a gateway restart or idle timeout. @@ -696,7 +696,7 @@ background work. The delivery path depends on that shape. - `resumeSessionId` is a host-local ACP/harness resume id, not an OpenClaw channel session key; OpenClaw still checks ACP spawn policy and target agent policy before dispatch, while the ACP backend or harness owns authorization for loading that upstream id. - `resumeSessionId` restores the upstream ACP conversation history; `thread` and `mode` still apply normally to the new OpenClaw session you are creating, so `mode: "session"` still requires `thread: true`. - The target agent must support `session/load` (Codex and Claude Code do). - - If the session id is not found, the spawn fails with a clear error — no silent fallback to a new session. + - If the session id is not found, the spawn fails with a clear error - no silent fallback to a new session. @@ -709,7 +709,7 @@ background work. The delivery path depends on that shape. 4. Verify `accepted=yes`, a real `childSessionKey`, and no validator error. 5. Clean up the temporary bridge session. - Keep the gate on `mode: "run"` and skip `streamTo: "parent"` — + Keep the gate on `mode: "run"` and skip `streamTo: "parent"` - thread-bound `mode: "session"` and stream-relay paths are separate richer integration passes. @@ -793,18 +793,18 @@ operations: | ---------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `/acp model ` | runtime config key `model` | For Codex ACP, OpenClaw normalizes `openai-codex/` to the adapter model id and maps slash reasoning suffixes such as `openai-codex/gpt-5.4/high` to `reasoning_effort`. | | `/acp set thinking ` | runtime config key `thinking` | For Codex ACP, OpenClaw sends the corresponding `reasoning_effort` where the adapter supports one. | -| `/acp permissions ` | runtime config key `approval_policy` | — | -| `/acp timeout ` | runtime config key `timeout` | — | +| `/acp permissions ` | runtime config key `approval_policy` | - | +| `/acp timeout ` | runtime config key `timeout` | - | | `/acp cwd ` | runtime cwd override | Direct update. | | `/acp set ` | generic | `key=cwd` uses the cwd override path. | -| `/acp reset-options` | clears all runtime overrides | — | +| `/acp reset-options` | clears all runtime overrides | - | ## acpx harness, plugin setup, and permissions For acpx harness configuration (Claude Code / Codex / Gemini CLI aliases), the plugin-tools and OpenClaw-tools MCP bridges, and ACP permission modes, see -[ACP agents — setup](/tools/acp-agents-setup). +[ACP agents - setup](/tools/acp-agents-setup). ## Troubleshooting @@ -835,7 +835,7 @@ permission modes, see ## Related -- [ACP agents — setup](/tools/acp-agents-setup) +- [ACP agents - setup](/tools/acp-agents-setup) - [Agent send](/tools/agent-send) - [CLI Backends](/gateway/cli-backends) - [Codex harness](/plugins/codex-harness) From 0bdba47a3e89fb2f17a64a0f64d3619da8b13b3d Mon Sep 17 00:00:00 2001 From: Brad Hallett <53977268+bradhallett@users.noreply.github.com> Date: Tue, 5 May 2026 22:35:47 -0400 Subject: [PATCH 2/5] fix: disable Pi auto-compaction when safeguard mode is active (#73839) Merged via squash. Prepared head SHA: d554201343a7aba4a861b78d4897071cfc68e635 Co-authored-by: bradhallett <53977268+bradhallett@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman --- CHANGELOG.md | 3 +- src/agents/command/cli-compaction.ts | 8 +- src/agents/pi-embedded-runner/extensions.ts | 13 +-- src/agents/pi-embedded-runner/run/attempt.ts | 2 + src/agents/pi-settings.test.ts | 86 ++++++++++++++++++++ src/agents/pi-settings.ts | 26 ++++-- 6 files changed, 120 insertions(+), 18 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ecf4124c65f..4d0b9be450e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -392,7 +392,8 @@ Docs: https://docs.openclaw.ai - Agents/sessions: after embedded Pi runs, append assistant-visible reply text to session JSONL only when Pi did not already persist an equivalent tail assistant entry, without re-mirroring the user prompt Pi owns. Fixes #77823. (#77839) Thanks @neeravmakwana. - Plugins/CLI: load the install-records ledger when listing channel-catalog entries, so npm-installed third-party channel plugins resolve through `openclaw channels login`/`channels add` instead of failing with `Unsupported channel`. (#77269) Thanks @pumpkinxing1. - Memory wiki/Security: enforce session visibility on shared-memory `wiki_search` and `wiki_get` so sandboxed subagents cannot read transcript content from sibling or parent sessions. Fixes GHSA-72fw-cqh5-f324. Thanks @zsxsoft. -- Exec approvals: enforce allowlist `argPattern` argument restrictions on Linux and macOS as well as Windows, so an entry like `{ pattern: "python3", argPattern: "^safe\\.py$" }` no longer silently relaxes to a path-only match on non-Windows hosts. (#75143) Thanks @eleqtrizit. +- Exec approvals: enforce allowlist `argPattern` argument restrictions on Linux and macOS as well as Windows, so an entry like `{ pattern: "python3", argPattern: "^safe\.py$" }` no longer silently relaxes to a path-only match on non-Windows hosts. (#75143) Thanks @eleqtrizit. +- Agents/compaction: disable Pi auto-compaction whenever OpenClaw effectively owns safeguard compaction, including provider-backed safeguard mode, so Pi and OpenClaw no longer fight over long-session compaction. Fixes #73003. (#73839) Thanks @bradhallett. ## 2026.5.3-1 diff --git a/src/agents/command/cli-compaction.ts b/src/agents/command/cli-compaction.ts index 3cd0c702b41..a3370e35e59 100644 --- a/src/agents/command/cli-compaction.ts +++ b/src/agents/command/cli-compaction.ts @@ -1,6 +1,7 @@ import type { AgentMessage } from "@mariozechner/pi-agent-core"; import { SessionManager } from "@mariozechner/pi-coding-agent"; import type { SessionEntry } from "../../config/sessions/types.js"; +import type { AgentCompactionMode } from "../../config/types.agent-defaults.js"; import type { OpenClawConfig } from "../../config/types.openclaw.js"; import { resolveContextEngine as resolveContextEngineImpl } from "../../context-engine/registry.js"; import type { ContextEngine } from "../../context-engine/types.js"; @@ -10,7 +11,10 @@ import { runContextEngineMaintenance as runContextEngineMaintenanceImpl } from " import { shouldPreemptivelyCompactBeforePrompt as shouldPreemptivelyCompactBeforePromptImpl } from "../pi-embedded-runner/run/preemptive-compaction.js"; import { resolveLiveToolResultMaxChars as resolveLiveToolResultMaxCharsImpl } from "../pi-embedded-runner/tool-result-truncation.js"; import { createPreparedEmbeddedPiSettingsManager as createPreparedEmbeddedPiSettingsManagerImpl } from "../pi-project-settings.js"; -import { applyPiAutoCompactionGuard as applyPiAutoCompactionGuardImpl } from "../pi-settings.js"; +import { + applyPiAutoCompactionGuard as applyPiAutoCompactionGuardImpl, + resolveEffectiveCompactionMode, +} from "../pi-settings.js"; import type { SkillSnapshot } from "../skills.js"; import { recordCliCompactionInStore as recordCliCompactionInStoreImpl } from "./session-store.js"; @@ -38,6 +42,7 @@ type CliCompactionDeps = { applyPiAutoCompactionGuard: (params: { settingsManager: SettingsManagerLike; contextEngineInfo?: ContextEngine["info"]; + compactionMode?: AgentCompactionMode; }) => unknown; shouldPreemptivelyCompactBeforePrompt: typeof shouldPreemptivelyCompactBeforePromptImpl; resolveLiveToolResultMaxChars: typeof resolveLiveToolResultMaxCharsImpl; @@ -207,6 +212,7 @@ export async function runCliTurnCompactionLifecycle(params: { await cliCompactionDeps.applyPiAutoCompactionGuard({ settingsManager, contextEngineInfo: contextEngine.info, + compactionMode: resolveEffectiveCompactionMode(params.cfg), }); const preemptiveCompaction = cliCompactionDeps.shouldPreemptivelyCompactBeforePrompt({ diff --git a/src/agents/pi-embedded-runner/extensions.ts b/src/agents/pi-embedded-runner/extensions.ts index c639d58ea14..fa88f47d934 100644 --- a/src/agents/pi-embedded-runner/extensions.ts +++ b/src/agents/pi-embedded-runner/extensions.ts @@ -12,7 +12,7 @@ import contextPruningExtension from "../pi-hooks/context-pruning.js"; import { setContextPruningRuntime } from "../pi-hooks/context-pruning/runtime.js"; import { computeEffectiveSettings } from "../pi-hooks/context-pruning/settings.js"; import { makeToolPrunablePredicate } from "../pi-hooks/context-pruning/tools.js"; -import { ensurePiCompactionReserveTokens } from "../pi-settings.js"; +import { ensurePiCompactionReserveTokens, resolveEffectiveCompactionMode } from "../pi-settings.js"; import { resolveTranscriptPolicy } from "../transcript-policy.js"; import { isCacheTtlEligibleProvider, readLastCacheTtlTimestamp } from "./cache-ttl.js"; @@ -123,15 +123,6 @@ function buildContextPruningFactory(params: { return contextPruningExtension; } -function resolveCompactionMode(cfg?: OpenClawConfig): "default" | "safeguard" { - const compaction = cfg?.agents?.defaults?.compaction; - // A registered compaction provider requires the safeguard extension path - if (compaction?.provider) { - return "safeguard"; - } - return compaction?.mode === "safeguard" ? "safeguard" : "default"; -} - export function buildEmbeddedExtensionFactories(params: { cfg: OpenClawConfig | undefined; sessionManager: SessionManager; @@ -140,7 +131,7 @@ export function buildEmbeddedExtensionFactories(params: { model: ProviderRuntimeModel | undefined; }): ExtensionFactory[] { const factories: ExtensionFactory[] = []; - if (resolveCompactionMode(params.cfg) === "safeguard") { + if (resolveEffectiveCompactionMode(params.cfg) === "safeguard") { const compactionCfg = params.cfg?.agents?.defaults?.compaction; const qualityGuardCfg = compactionCfg?.qualityGuard; const contextWindowInfo = resolveContextWindowInfo({ diff --git a/src/agents/pi-embedded-runner/run/attempt.ts b/src/agents/pi-embedded-runner/run/attempt.ts index 30084da735e..d55caa0e407 100644 --- a/src/agents/pi-embedded-runner/run/attempt.ts +++ b/src/agents/pi-embedded-runner/run/attempt.ts @@ -107,6 +107,7 @@ import { applyPiAutoCompactionGuard, applyPiCompactionSettingsFromConfig, isSilentOverflowProneModel, + resolveEffectiveCompactionMode, } from "../../pi-settings.js"; import { createClientToolNameConflictError, @@ -1453,6 +1454,7 @@ export async function runEmbeddedAttempt( const piAutoCompactionGuardArgs = { settingsManager, contextEngineInfo: activeContextEngine?.info, + compactionMode: resolveEffectiveCompactionMode(params.config), silentOverflowProneProvider: isSilentOverflowProneModel({ provider: params.provider, modelId: params.modelId, diff --git a/src/agents/pi-settings.test.ts b/src/agents/pi-settings.test.ts index 81c05e12c6c..bca472c9b14 100644 --- a/src/agents/pi-settings.test.ts +++ b/src/agents/pi-settings.test.ts @@ -5,7 +5,9 @@ import { applyPiCompactionSettingsFromConfig, DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR, isSilentOverflowProneModel, + resolveEffectiveCompactionMode, resolveCompactionReserveTokensFloor, + shouldDisablePiAutoCompaction, } from "./pi-settings.js"; describe("applyPiCompactionSettingsFromConfig", () => { @@ -347,6 +349,40 @@ describe("resolveCompactionReserveTokensFloor", () => { ).toBe(0); }); }); +describe("resolveEffectiveCompactionMode", () => { + it("defaults to default compaction mode", () => { + expect(resolveEffectiveCompactionMode()).toBe("default"); + expect(resolveEffectiveCompactionMode({ agents: { defaults: { compaction: {} } } })).toBe( + "default", + ); + expect( + resolveEffectiveCompactionMode({ + agents: { defaults: { compaction: { mode: "default" } } }, + }), + ).toBe("default"); + }); + + it("returns safeguard for explicit safeguard mode", () => { + expect( + resolveEffectiveCompactionMode({ + agents: { defaults: { compaction: { mode: "safeguard" } } }, + }), + ).toBe("safeguard"); + }); + + it("returns safeguard when a compaction provider is configured", () => { + expect( + resolveEffectiveCompactionMode({ + agents: { defaults: { compaction: { provider: "deepseek" } } }, + }), + ).toBe("safeguard"); + expect( + resolveEffectiveCompactionMode({ + agents: { defaults: { compaction: { mode: "default", provider: "deepseek" } } }, + }), + ).toBe("safeguard"); + }); +}); describe("isSilentOverflowProneModel", () => { // Reporter's repro shape: openrouter routing to z-ai/glm. Both the bare @@ -432,6 +468,36 @@ describe("isSilentOverflowProneModel", () => { }); }); +describe("shouldDisablePiAutoCompaction", () => { + it("returns false with no owner, default mode, and ordinary provider behavior", () => { + expect(shouldDisablePiAutoCompaction({})).toBe(false); + expect(shouldDisablePiAutoCompaction({ compactionMode: "default" })).toBe(false); + expect( + shouldDisablePiAutoCompaction({ + contextEngineInfo: { id: "legacy", name: "Legacy", ownsCompaction: false }, + compactionMode: "default", + silentOverflowProneProvider: false, + }), + ).toBe(false); + }); + + it("returns true when a context engine owns compaction", () => { + expect( + shouldDisablePiAutoCompaction({ + contextEngineInfo: { id: "third-party", name: "Third-party", ownsCompaction: true }, + }), + ).toBe(true); + }); + + it("returns true when effective compaction mode is safeguard", () => { + expect(shouldDisablePiAutoCompaction({ compactionMode: "safeguard" })).toBe(true); + }); + + it("returns true for silent-overflow-prone providers", () => { + expect(shouldDisablePiAutoCompaction({ silentOverflowProneProvider: true })).toBe(true); + }); +}); + describe("applyPiAutoCompactionGuard", () => { // Direct repro of openclaw#75799: pi-ai's silent-overflow detection misfires // on a successful turn against z.ai-style providers, triggering Pi's @@ -481,6 +547,26 @@ describe("applyPiAutoCompactionGuard", () => { expect(setCompactionEnabled).toHaveBeenCalledWith(false); }); + it("disables Pi auto-compaction when provider config forces safeguard mode", () => { + const setCompactionEnabled = vi.fn(); + const settingsManager = { + getCompactionReserveTokens: () => 20_000, + getCompactionKeepRecentTokens: () => 4_000, + applyOverrides: () => {}, + setCompactionEnabled, + }; + + const result = applyPiAutoCompactionGuard({ + settingsManager, + compactionMode: resolveEffectiveCompactionMode({ + agents: { defaults: { compaction: { provider: "deepseek" } } }, + }), + }); + + expect(result).toEqual({ supported: true, disabled: true }); + expect(setCompactionEnabled).toHaveBeenCalledWith(false); + }); + // Default-mode runs against ordinary providers must keep Pi's auto-compaction // enabled. Disabling it across the board would silently remove Pi's // overflow-recovery path inside Session.prompt() for users who are not diff --git a/src/agents/pi-settings.ts b/src/agents/pi-settings.ts index 4daedcad832..8beb54d94b0 100644 --- a/src/agents/pi-settings.ts +++ b/src/agents/pi-settings.ts @@ -1,3 +1,4 @@ +import type { AgentCompactionMode } from "../config/types.agent-defaults.js"; import type { OpenClawConfig } from "../config/types.openclaw.js"; import type { ContextEngineInfo } from "../context-engine/types.js"; import { MIN_PROMPT_BUDGET_RATIO, MIN_PROMPT_BUDGET_TOKENS } from "./pi-compaction-constants.js"; @@ -124,6 +125,15 @@ export function applyPiCompactionSettingsFromConfig(params: { }; } +/** Resolve the compaction mode after provider-backed safeguard promotion. */ +export function resolveEffectiveCompactionMode(cfg?: OpenClawConfig): AgentCompactionMode { + const compaction = cfg?.agents?.defaults?.compaction; + if (compaction?.provider) { + return "safeguard"; + } + return compaction?.mode === "safeguard" ? "safeguard" : "default"; +} + /** * Detect providers whose pi-ai `isContextOverflow` Case 2 (silent overflow) * fires on a successful turn and triggers Pi's `_runAutoCompaction` from @@ -171,16 +181,20 @@ export function isSilentOverflowProneModel(model: { * Disable Pi's `_checkCompaction → _runAutoCompaction` (which would otherwise * fire from inside `Session.prompt()` and reassign `agent.state.messages` * before the provider call) when OpenClaw or a plugin owns compaction: - * `contextEngineInfo.ownsCompaction === true`, or the active model is - * silent-overflow-prone (openclaw#75799). Default-mode runs against ordinary - * providers keep Pi's auto-compaction as the existing baseline. + * `contextEngineInfo.ownsCompaction === true`, effective safeguard compaction, + * or an active model that is silent-overflow-prone (openclaw#75799). + * Default-mode runs against ordinary providers keep Pi's auto-compaction as + * the existing baseline. */ -function shouldDisablePiAutoCompaction(params: { +export function shouldDisablePiAutoCompaction(params: { contextEngineInfo?: ContextEngineInfo; + compactionMode?: AgentCompactionMode; silentOverflowProneProvider?: boolean; }): boolean { return ( - params.contextEngineInfo?.ownsCompaction === true || params.silentOverflowProneProvider === true + params.contextEngineInfo?.ownsCompaction === true || + params.compactionMode === "safeguard" || + params.silentOverflowProneProvider === true ); } @@ -194,10 +208,12 @@ function shouldDisablePiAutoCompaction(params: { export function applyPiAutoCompactionGuard(params: { settingsManager: PiSettingsManagerLike; contextEngineInfo?: ContextEngineInfo; + compactionMode?: AgentCompactionMode; silentOverflowProneProvider?: boolean; }): { supported: boolean; disabled: boolean } { const disable = shouldDisablePiAutoCompaction({ contextEngineInfo: params.contextEngineInfo, + compactionMode: params.compactionMode, silentOverflowProneProvider: params.silentOverflowProneProvider, }); const hasMethod = typeof params.settingsManager.setCompactionEnabled === "function"; From ea391c6df2800d65b81d46f8c0e3e9a02cc6a732 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Wed, 6 May 2026 03:36:41 +0100 Subject: [PATCH 3/5] test: stabilize cron and pairing shard hangs --- ...runs-one-shot-main-job-disables-it.test.ts | 192 +----------------- src/infra/device-pairing.test.ts | 16 +- 2 files changed, 25 insertions(+), 183 deletions(-) diff --git a/src/cron/service.runs-one-shot-main-job-disables-it.test.ts b/src/cron/service.runs-one-shot-main-job-disables-it.test.ts index eb29c51022d..1a73ac2ac47 100644 --- a/src/cron/service.runs-one-shot-main-job-disables-it.test.ts +++ b/src/cron/service.runs-one-shot-main-job-disables-it.test.ts @@ -1,5 +1,4 @@ -import path from "node:path"; -import { beforeEach, describe, expect, it, vi } from "vitest"; +import { describe, expect, it, vi } from "vitest"; import { HEARTBEAT_SKIP_CRON_IN_PROGRESS, HEARTBEAT_SKIP_REQUESTS_IN_FLIGHT, @@ -7,184 +6,17 @@ import { } from "../infra/heartbeat-wake.js"; import type { CronEvent, CronServiceDeps } from "./service.js"; import { CronService } from "./service.js"; -import { createDeferred, createNoopLogger, installCronTestHooks } from "./service.test-harness.js"; +import { + createCronStoreHarness, + createDeferred, + createNoopLogger, + installCronTestHooks, +} from "./service.test-harness.js"; const noopLogger = createNoopLogger(); installCronTestHooks({ logger: noopLogger }); - -type FakeFsEntry = - | { kind: "file"; content: string; mtimeMs: number } - | { kind: "dir"; mtimeMs: number }; - -const fsState = vi.hoisted(() => ({ - entries: new Map(), - nowMs: 0, - fixtureCount: 0, -})); - -const abs = (p: string) => path.resolve(p); -const fixturesRoot = abs(path.join("__openclaw_vitest__", "cron", "runs-one-shot")); -const isFixturePath = (p: string) => { - const resolved = abs(p); - const rootPrefix = `${fixturesRoot}${path.sep}`; - return resolved === fixturesRoot || resolved.startsWith(rootPrefix); -}; - -function bumpMtimeMs() { - fsState.nowMs += 1; - return fsState.nowMs; -} - -function ensureDir(dirPath: string) { - let current = abs(dirPath); - while (true) { - if (!fsState.entries.has(current)) { - fsState.entries.set(current, { kind: "dir", mtimeMs: bumpMtimeMs() }); - } - const parent = path.dirname(current); - if (parent === current) { - break; - } - current = parent; - } -} - -function setFile(filePath: string, content: string) { - const resolved = abs(filePath); - ensureDir(path.dirname(resolved)); - fsState.entries.set(resolved, { kind: "file", content, mtimeMs: bumpMtimeMs() }); -} - -async function makeStorePath() { - const dir = path.join(fixturesRoot, `case-${fsState.fixtureCount++}`); - ensureDir(dir); - const storePath = path.join(dir, "cron", "jobs.json"); - ensureDir(path.dirname(storePath)); - return { storePath, cleanup: async () => {} }; -} - -vi.mock("node:fs", async () => { - const actual = await vi.importActual("node:fs"); - const pathMod = await import("node:path"); - const absInMock = (p: string) => pathMod.resolve(p); - const isFixtureInMock = (p: string) => { - const resolved = absInMock(p); - const rootPrefix = `${absInMock(fixturesRoot)}${pathMod.sep}`; - return resolved === absInMock(fixturesRoot) || resolved.startsWith(rootPrefix); - }; - - const mkErr = (code: string, message: string) => Object.assign(new Error(message), { code }); - - const promises = { - ...actual.promises, - mkdir: async (p: string) => { - if (!isFixtureInMock(p)) { - return await actual.promises.mkdir(p, { recursive: true }); - } - ensureDir(p); - return undefined; - }, - readFile: async (p: string) => { - if (!isFixtureInMock(p)) { - return await actual.promises.readFile(p, "utf-8"); - } - const entry = fsState.entries.get(absInMock(p)); - if (!entry || entry.kind !== "file") { - throw mkErr("ENOENT", `ENOENT: no such file or directory, open '${p}'`); - } - return entry.content; - }, - writeFile: async (p: string, data: string | Uint8Array) => { - if (!isFixtureInMock(p)) { - return await actual.promises.writeFile(p, data, "utf-8"); - } - const content = typeof data === "string" ? data : Buffer.from(data).toString("utf-8"); - setFile(p, content); - }, - rename: async (from: string, to: string) => { - if (!isFixtureInMock(from) || !isFixtureInMock(to)) { - return await actual.promises.rename(from, to); - } - const fromAbs = absInMock(from); - const toAbs = absInMock(to); - const entry = fsState.entries.get(fromAbs); - if (!entry || entry.kind !== "file") { - throw mkErr("ENOENT", `ENOENT: no such file or directory, rename '${from}' -> '${to}'`); - } - ensureDir(pathMod.dirname(toAbs)); - fsState.entries.delete(fromAbs); - fsState.entries.set(toAbs, { ...entry, mtimeMs: bumpMtimeMs() }); - }, - copyFile: async (from: string, to: string) => { - if (!isFixtureInMock(from) || !isFixtureInMock(to)) { - return await actual.promises.copyFile(from, to); - } - const entry = fsState.entries.get(absInMock(from)); - if (!entry || entry.kind !== "file") { - throw mkErr("ENOENT", `ENOENT: no such file or directory, copyfile '${from}' -> '${to}'`); - } - setFile(to, entry.content); - }, - stat: async (p: string) => { - if (!isFixtureInMock(p)) { - return await actual.promises.stat(p); - } - const entry = fsState.entries.get(absInMock(p)); - if (!entry) { - throw mkErr("ENOENT", `ENOENT: no such file or directory, stat '${p}'`); - } - return { - mtimeMs: entry.mtimeMs, - isDirectory: () => entry.kind === "dir", - isFile: () => entry.kind === "file", - }; - }, - access: async (p: string) => { - if (!isFixtureInMock(p)) { - return await actual.promises.access(p); - } - const entry = fsState.entries.get(absInMock(p)); - if (!entry) { - throw mkErr("ENOENT", `ENOENT: no such file or directory, access '${p}'`); - } - }, - unlink: async (p: string) => { - if (!isFixtureInMock(p)) { - return await actual.promises.unlink(p); - } - fsState.entries.delete(absInMock(p)); - }, - } as unknown as typeof actual.promises; - - const wrapped = { ...actual, promises }; - return { ...wrapped, default: wrapped }; -}); - -vi.mock("node:fs/promises", async () => { - const actual = await vi.importActual("node:fs/promises"); - const wrapped = { - ...actual, - mkdir: async (p: string, _opts?: unknown) => { - if (!isFixturePath(p)) { - return await actual.mkdir(p, { recursive: true }); - } - ensureDir(p); - return undefined; - }, - writeFile: async (p: string, data: string, _enc?: unknown) => { - if (!isFixturePath(p)) { - return await actual.writeFile(p, data, "utf-8"); - } - setFile(p, data); - }, - }; - return { ...wrapped, default: wrapped }; -}); - -beforeEach(() => { - fsState.entries.clear(); - fsState.nowMs = 0; - ensureDir(fixturesRoot); +const { makeStorePath } = createCronStoreHarness({ + prefix: "openclaw-cron-runs-one-shot-", }); function createCronEventHarness() { @@ -229,7 +61,6 @@ type CronHarnessOptions = { }; async function createCronHarness(options: CronHarnessOptions = {}) { - ensureDir(fixturesRoot); const store = await makeStorePath(); const enqueueSystemEvent = vi.fn(); const requestHeartbeat = vi.fn(); @@ -377,6 +208,7 @@ function expectMainSystemEventPosted(enqueueSystemEvent: unknown, text: string) } async function stopCronAndCleanup(cron: CronService, store: { cleanup: () => Promise }) { + await cron.status(); cron.stop(); await store.cleanup(); } @@ -678,7 +510,6 @@ describe("CronService", () => { }); it("rejects unsupported session/payload combinations", async () => { - ensureDir(fixturesRoot); const store = await makeStorePath(); const cron = createStartedCronService( @@ -712,7 +543,6 @@ describe("CronService", () => { }), ).rejects.toThrow(/isolated.*cron jobs require/); - cron.stop(); - await store.cleanup(); + await stopCronAndCleanup(cron, store); }); }); diff --git a/src/infra/device-pairing.test.ts b/src/infra/device-pairing.test.ts index c98d439606d..9eb26966060 100644 --- a/src/infra/device-pairing.test.ts +++ b/src/infra/device-pairing.test.ts @@ -211,8 +211,20 @@ describe("device pairing tokens", () => { }, baseDir, ); - const originalTs = first.request.ts; - await new Promise((resolve) => setTimeout(resolve, 20)); + const originalTs = first.request.ts - 1_000; + const paths = resolvePairingPaths(baseDir, "devices"); + const pendingById = JSON.parse(await readFile(paths.pendingPath, "utf8")) as Record< + string, + { ts: number } + >; + const pending = pendingById[first.request.requestId]; + expect(pending).toBeDefined(); + if (!pending) { + throw new Error("expected pending pairing request"); + } + pending.ts = originalTs; + await writeFile(paths.pendingPath, JSON.stringify(pendingById, null, 2)); + const second = await requestDevicePairing( { deviceId: "device-1", From 8489d0eb6847ff7d6d1b6f22f4a1aa21b76a0c46 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Wed, 6 May 2026 03:43:39 +0100 Subject: [PATCH 4/5] test: update spawn workspace pi settings mock --- .../run/attempt.spawn-workspace.test-support.ts | 1 + 1 file changed, 1 insertion(+) diff --git a/src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts b/src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts index f9dc87c0e15..c9d4d6e0cb7 100644 --- a/src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts +++ b/src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts @@ -350,6 +350,7 @@ vi.mock("../../pi-settings.js", () => ({ }, }), isSilentOverflowProneModel: () => false, + resolveEffectiveCompactionMode: () => "default", })); vi.mock("../extensions.js", () => ({ From 4395f1dd6644a08d97c23c98adb5b5c3bfa2a261 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 5 May 2026 19:45:44 -0700 Subject: [PATCH 5/5] docs: typography hygiene + drop one in-body H1 across 5 pages Replaced 98 typography characters (curly quotes, apostrophes, em/en dashes, non-breaking hyphens) with ASCII equivalents per docs/CLAUDE.md heading and content hygiene rules. - docs/plugins/sdk-migration.md: 20 chars - docs/help/testing.md: 20 chars - docs/automation/tasks.md: 20 chars - docs/plugins/sdk-channel-plugins.md: 19 chars - docs/channels/yuanbao.md: 19 chars; removed the duplicate '# Yuanbao' H1 (Mintlify renders title from frontmatter). --- docs/automation/tasks.md | 40 ++++++++++++++--------------- docs/channels/yuanbao.md | 40 ++++++++++++++--------------- docs/help/testing.md | 26 +++++++++---------- docs/plugins/sdk-channel-plugins.md | 36 +++++++++++++------------- docs/plugins/sdk-migration.md | 40 ++++++++++++++--------------- 5 files changed, 90 insertions(+), 92 deletions(-) diff --git a/docs/automation/tasks.md b/docs/automation/tasks.md index 47c87d77dd6..a3075542590 100644 --- a/docs/automation/tasks.md +++ b/docs/automation/tasks.md @@ -14,7 +14,7 @@ Looking for scheduling? See [Automation and tasks](/automation) for choosing the Background tasks track work that runs **outside your main conversation session**: ACP runs, subagent spawns, isolated cron job executions, and CLI-initiated operations. -Tasks do **not** replace sessions, cron jobs, or heartbeats — they are the **activity ledger** that records what detached work happened, when, and whether it succeeded. +Tasks do **not** replace sessions, cron jobs, or heartbeats - they are the **activity ledger** that records what detached work happened, when, and whether it succeeded. Not every agent run creates a task. Heartbeat turns and normal interactive chat do not. All cron executions, ACP spawns, subagent spawns, and CLI agent commands do. @@ -22,7 +22,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat ## TL;DR -- Tasks are **records**, not schedulers — cron and heartbeat decide _when_ work runs, tasks track _what happened_. +- Tasks are **records**, not schedulers - cron and heartbeat decide _when_ work runs, tasks track _what happened_. - ACP, subagents, all cron jobs, and CLI operations create tasks. Heartbeat turns do not. - Each task moves through `queued → running → terminal` (succeeded, failed, timed_out, cancelled, or lost). - Cron tasks stay live while the cron runtime still owns the job; if the @@ -100,7 +100,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat - Main-session cron tasks use `silent` notify policy by default — they create records for tracking but do not generate notifications. Isolated cron tasks also default to `silent` but are more visible because they run in their own session. + Main-session cron tasks use `silent` notify policy by default - they create records for tracking but do not generate notifications. Isolated cron tasks also default to `silent` but are more visible because they run in their own session. Session-backed `music_generate` and `video_generate` runs also use `silent` notify policy. They still create task records, but completion is handed back to the original agent session as an internal wake so the agent can write the follow-up message and attach the finished media itself. Group/channel completions follow the normal visible-reply policy, so the agent uses the message tool when source delivery requires it. If the completion agent fails to produce message-tool delivery evidence in a tool-only route, OpenClaw sends the completion fallback directly to the original channel instead of leaving the media private. @@ -109,7 +109,7 @@ Not every agent run creates a task. Heartbeat turns and normal interactive chat While a session-backed `video_generate` task is still active, the tool also acts as a guardrail: repeated `video_generate` calls in that same session return the active task status instead of starting a second concurrent generation. Use `action: "status"` when you want an explicit progress/status lookup from the agent side. - - Heartbeat turns — main-session; see [Heartbeat](/gateway/heartbeat) + - Heartbeat turns - main-session; see [Heartbeat](/gateway/heartbeat) - Normal interactive chat turns - Direct `/command` responses @@ -140,7 +140,7 @@ stateDiagram-v2 | `cancelled` | Stopped by the operator via `openclaw tasks cancel` | | `lost` | The runtime lost authoritative backing state after a 5-minute grace period | -Transitions happen automatically — when the associated agent run ends, the task status updates to match. +Transitions happen automatically - when the associated agent run ends, the task status updates to match. Agent run completion is authoritative for active task records. A successful detached run finalizes as `succeeded`, ordinary run errors finalize as `failed`, and timeout or abort outcomes finalize as `timed_out`. If an operator already cancelled the task, or the runtime already recorded a stronger terminal state such as `failed`, `timed_out`, or `lost`, a later success signal does not downgrade that terminal status. @@ -161,12 +161,12 @@ Agent run completion is authoritative for active task records. A successful deta When a task reaches a terminal state, OpenClaw notifies you. There are two delivery paths: -**Direct delivery** — if the task has a channel target (the `requesterOrigin`), the completion message goes straight to that channel (Telegram, Discord, Slack, etc.). For subagent completions, OpenClaw also preserves bound thread/topic routing when available and can fill a missing `to` / account from the requester session's stored route (`lastChannel` / `lastTo` / `lastAccountId`) before giving up on direct delivery. +**Direct delivery** - if the task has a channel target (the `requesterOrigin`), the completion message goes straight to that channel (Telegram, Discord, Slack, etc.). For subagent completions, OpenClaw also preserves bound thread/topic routing when available and can fill a missing `to` / account from the requester session's stored route (`lastChannel` / `lastTo` / `lastAccountId`) before giving up on direct delivery. -**Session-queued delivery** — if direct delivery fails or no origin is set, the update is queued as a system event in the requester's session and surfaces on the next heartbeat. +**Session-queued delivery** - if direct delivery fails or no origin is set, the update is queued as a system event in the requester's session and surfaces on the next heartbeat. -Task completion triggers an immediate heartbeat wake so you see the result quickly — you do not have to wait for the next scheduled heartbeat tick. +Task completion triggers an immediate heartbeat wake so you see the result quickly - you do not have to wait for the next scheduled heartbeat tick. That means the usual workflow is push-based: start detached work once, then let the runtime wake or notify you on completion. Poll task state only when you need debugging, intervention, or an explicit audit. @@ -177,7 +177,7 @@ Control how much you hear about each task: | Policy | What is delivered | | --------------------- | ----------------------------------------------------------------------- | -| `done_only` (default) | Only terminal state (succeeded, failed, etc.) — **this is the default** | +| `done_only` (default) | Only terminal state (succeeded, failed, etc.) - **this is the default** | | `state_changes` | Every state transition and progress update | | `silent` | Nothing at all | @@ -290,9 +290,9 @@ Tasks: 3 queued · 2 running · 1 issues The summary reports: -- **active** — count of `queued` + `running` -- **failures** — count of `failed` + `timed_out` + `lost` -- **byRuntime** — breakdown by `acp`, `subagent`, `cron`, `cli` +- **active** - count of `queued` + `running` +- **failures** - count of `failed` + `timed_out` + `lost` +- **byRuntime** - breakdown by `acp`, `subagent`, `cron`, `cli` Both `/status` and the `session_status` tool use a cleanup-aware task snapshot: active tasks are preferred, stale completed rows are hidden, and recent failures only surface when no active work remains. This keeps the status card focused on what matters right now. @@ -343,13 +343,13 @@ A sweeper runs every **60 seconds** and handles four things: - A cron job **definition** lives in `~/.openclaw/cron/jobs.json`; runtime execution state lives beside it in `~/.openclaw/cron/jobs-state.json`. **Every** cron execution creates a task record — both main-session and isolated. Main-session cron tasks default to `silent` notify policy so they track without generating notifications. + A cron job **definition** lives in `~/.openclaw/cron/jobs.json`; runtime execution state lives beside it in `~/.openclaw/cron/jobs-state.json`. **Every** cron execution creates a task record - both main-session and isolated. Main-session cron tasks default to `silent` notify policy so they track without generating notifications. See [Cron Jobs](/automation/cron-jobs). - Heartbeat runs are main-session turns — they do not create task records. When a task completes, it can trigger a heartbeat wake so you see the result promptly. + Heartbeat runs are main-session turns - they do not create task records. When a task completes, it can trigger a heartbeat wake so you see the result promptly. See [Heartbeat](/gateway/heartbeat). @@ -358,14 +358,14 @@ A sweeper runs every **60 seconds** and handles four things: A task may reference a `childSessionKey` (where work runs) and a `requesterSessionKey` (who started it). Sessions are conversation context; tasks are activity tracking on top of that. - A task's `runId` links to the agent run doing the work. Agent lifecycle events (start, end, error) automatically update the task status — you do not need to manage the lifecycle manually. + A task's `runId` links to the agent run doing the work. Agent lifecycle events (start, end, error) automatically update the task status - you do not need to manage the lifecycle manually. ## Related -- [Automation & Tasks](/automation) — all automation mechanisms at a glance -- [CLI: Tasks](/cli/tasks) — CLI command reference -- [Heartbeat](/gateway/heartbeat) — periodic main-session turns -- [Scheduled Tasks](/automation/cron-jobs) — scheduling background work -- [Task Flow](/automation/taskflow) — flow orchestration above tasks +- [Automation & Tasks](/automation) - all automation mechanisms at a glance +- [CLI: Tasks](/cli/tasks) - CLI command reference +- [Heartbeat](/gateway/heartbeat) - periodic main-session turns +- [Scheduled Tasks](/automation/cron-jobs) - scheduling background work +- [Task Flow](/automation/taskflow) - flow orchestration above tasks diff --git a/docs/channels/yuanbao.md b/docs/channels/yuanbao.md index 88bd49bfb6c..9c493a5d065 100644 --- a/docs/channels/yuanbao.md +++ b/docs/channels/yuanbao.md @@ -6,8 +6,6 @@ read_when: title: Yuanbao --- -# Yuanbao - Tencent Yuanbao is Tencent's AI assistant platform. The OpenClaw channel plugin connects Yuanbao bots to OpenClaw over WebSocket so they can interact with users through direct messages and group chats. @@ -53,10 +51,10 @@ Follow the prompts to enter your App ID and App Secret. Configure `dmPolicy` to control who can DM the bot: -- `"pairing"` — unknown users receive a pairing code; approve via CLI -- `"allowlist"` — only users listed in `allowFrom` can chat -- `"open"` — allow all users (default) -- `"disabled"` — disable all DMs +- `"pairing"` - unknown users receive a pairing code; approve via CLI +- `"allowlist"` - only users listed in `allowFrom` can chat +- `"open"` - allow all users (default) +- `"disabled"` - disable all DMs **Approve a pairing request:** @@ -69,8 +67,8 @@ openclaw pairing approve yuanbao **Mention requirement** (`channels.yuanbao.requireMention`): -- `true` — require @mention (default) -- `false` — respond without @mention +- `true` - require @mention (default) +- `false` - respond without @mention Replying to the bot's message in a group chat is treated as an implicit mention. @@ -228,9 +226,9 @@ Replying to the bot's message in a group chat is treated as an implicit mention. ### Message limits -- `maxChars` — single message max character count (default: `3000` chars) -- `mediaMaxMb` — media upload/download limit (default: `20` MB) -- `overflowPolicy` — behavior when message exceeds limit: `"split"` (default) or `"stop"` +- `maxChars` - single message max character count (default: `3000` chars) +- `mediaMaxMb` - media upload/download limit (default: `20` MB) +- `overflowPolicy` - behavior when message exceeds limit: `"split"` (default) or `"stop"` ### Streaming @@ -358,13 +356,13 @@ Full configuration: [Gateway configuration](/gateway/configuration) | ------------------------------------------ | ------------------------------------------------- | -------------------------------------- | | `channels.yuanbao.enabled` | Enable/disable the channel | `true` | | `channels.yuanbao.defaultAccount` | Default account for outbound routing | `default` | -| `channels.yuanbao.accounts..appKey` | App Key (used for signing and ticket generation) | — | -| `channels.yuanbao.accounts..appSecret` | App Secret (used for signing) | — | -| `channels.yuanbao.accounts..token` | Pre-signed token (skips automatic ticket signing) | — | -| `channels.yuanbao.accounts..name` | Account display name | — | +| `channels.yuanbao.accounts..appKey` | App Key (used for signing and ticket generation) | - | +| `channels.yuanbao.accounts..appSecret` | App Secret (used for signing) | - | +| `channels.yuanbao.accounts..token` | Pre-signed token (skips automatic ticket signing) | - | +| `channels.yuanbao.accounts..name` | Account display name | - | | `channels.yuanbao.accounts..enabled` | Enable/disable a specific account | `true` | | `channels.yuanbao.dm.policy` | DM policy | `open` | -| `channels.yuanbao.dm.allowFrom` | DM allowlist (user ID list) | — | +| `channels.yuanbao.dm.allowFrom` | DM allowlist (user ID list) | - | | `channels.yuanbao.requireMention` | Require @mention in groups | `true` | | `channels.yuanbao.overflowPolicy` | Long message handling (`split` or `stop`) | `split` | | `channels.yuanbao.replyToMode` | Group reply-to strategy (`off`, `first`, `all`) | `first` | @@ -411,8 +409,8 @@ Full configuration: [Gateway configuration](/gateway/configuration) ## Related -- [Channels Overview](/channels) — all supported channels -- [Pairing](/channels/pairing) — DM authentication and pairing flow -- [Groups](/channels/groups) — group chat behavior and mention gating -- [Channel Routing](/channels/channel-routing) — session routing for messages -- [Security](/gateway/security) — access model and hardening +- [Channels Overview](/channels) - all supported channels +- [Pairing](/channels/pairing) - DM authentication and pairing flow +- [Groups](/channels/groups) - group chat behavior and mention gating +- [Channel Routing](/channels/channel-routing) - session routing for messages +- [Security](/gateway/security) - access model and hardening diff --git a/docs/help/testing.md b/docs/help/testing.md index 9a3d6df6de9..9ef6597fc8f 100644 --- a/docs/help/testing.md +++ b/docs/help/testing.md @@ -18,9 +18,9 @@ of Docker runners. This doc is a "how we test" guide: **QA stack (qa-lab, qa-channel, live transport lanes)** is documented separately: -- [QA overview](/concepts/qa-e2e-automation) — architecture, command surface, scenario authoring. -- [Matrix QA](/concepts/qa-matrix) — reference for `pnpm openclaw qa matrix`. -- [QA channel](/channels/qa-channel) — the synthetic transport plugin used by repo-backed scenarios. +- [QA overview](/concepts/qa-e2e-automation) - architecture, command surface, scenario authoring. +- [Matrix QA](/concepts/qa-matrix) - reference for `pnpm openclaw qa matrix`. +- [QA channel](/channels/qa-channel) - the synthetic transport plugin used by repo-backed scenarios. This page covers running the regular test suites and Docker/Parallels runners. The QA-specific runners section below ([QA-specific runners](#qa-specific-runners)) lists the concrete `qa` invocations and points back at the references above. @@ -301,7 +301,7 @@ gh workflow run package-acceptance.yml --ref main \ - Starts only the local AIMock provider server for direct protocol smoke testing. - `pnpm openclaw qa matrix` - - Runs the Matrix live QA lane against a disposable Docker-backed Tuwunel homeserver. Source-checkout only — packaged installs do not ship `qa-lab`. + - Runs the Matrix live QA lane against a disposable Docker-backed Tuwunel homeserver. Source-checkout only - packaged installs do not ship `qa-lab`. - Full CLI, profile/scenario catalog, env vars, and artifact layout: [Matrix QA](/concepts/qa-matrix). - `pnpm openclaw qa telegram` - Runs the Telegram live QA lane against a real private group using the driver and SUT bot tokens from env. @@ -399,7 +399,7 @@ The architecture and scenario-helper names for new channel adapters live in [QA ## Test suites (what runs where) -Think of the suites as “increasing realism” (and increasing flakiness/cost): +Think of the suites as "increasing realism" (and increasing flakiness/cost): ### Unit / integration (default) @@ -578,12 +578,12 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost): - Files: `src/**/*.live.test.ts`, `test/**/*.live.test.ts`, and bundled-plugin live tests under `extensions/` - Default: **enabled** by `pnpm test:live` (sets `OPENCLAW_LIVE_TEST=1`) - Scope: - - “Does this provider/model actually work _today_ with real creds?” + - "Does this provider/model actually work _today_ with real creds?" - Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior - Expectations: - Not CI-stable by design (real networks, real provider policies, quotas, outages) - Costs money / uses rate limits - - Prefer running narrowed subsets instead of “everything” + - Prefer running narrowed subsets instead of "everything" - Live runs source `~/.profile` to pick up missing API keys. - By default, live runs still isolate `HOME` and copy config/auth material into a temp test home so unit fixtures cannot mutate your real `~/.openclaw`. - Set `OPENCLAW_LIVE_USE_REAL_HOME=1` only when you intentionally need live tests to use your real home directory. @@ -601,13 +601,13 @@ Use this decision table: - Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot) - Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e` -- Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live` +- Debugging "my bot is down" / provider-specific failures / tool calling: run a narrowed `pnpm test:live` ## Live (network-touching) tests For the live model matrix, CLI backend smokes, ACP smokes, Codex app-server harness, and all media-provider live tests (Deepgram, BytePlus, ComfyUI, image, -music, video, media harness) — plus credential handling for live runs — see +music, video, media harness) - plus credential handling for live runs - see [Testing live suites](/help/testing-live). For the dedicated update and plugin validation checklist, see [Testing updates and plugins](/help/testing-updates-plugins). @@ -744,19 +744,19 @@ Run full Mintlify anchor validation when you need in-page heading checks too: `p ## Offline regression (CI-safe) -These are “real pipeline” regressions without real providers: +These are "real pipeline" regressions without real providers: - Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.test.ts` (case: "runs a mock OpenAI tool call end-to-end via gateway agent loop") - Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.test.ts` (case: "runs wizard over ws and writes auth token config") ## Agent reliability evals (skills) -We already have a few CI-safe tests that behave like “agent reliability evals”: +We already have a few CI-safe tests that behave like "agent reliability evals": - Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.test.ts`). - End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.test.ts`). -What’s still missing for skills (see [Skills](/tools/skills)): +What's still missing for skills (see [Skills](/tools/skills)): - **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)? - **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args? @@ -829,7 +829,7 @@ Contract tests run in CI and do not require real API keys. When you fix a provider/model issue discovered in live: - Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation) -- If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars +- If it's inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars - Prefer targeting the smallest layer that catches the bug: - provider request conversion/replay bug → direct models test - gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test diff --git a/docs/plugins/sdk-channel-plugins.md b/docs/plugins/sdk-channel-plugins.md index e7c4aa61e95..d8a9ed97e12 100644 --- a/docs/plugins/sdk-channel-plugins.md +++ b/docs/plugins/sdk-channel-plugins.md @@ -23,13 +23,13 @@ pairing, reply threading, and outbound messaging. Channel plugins do not need their own send/edit/react tools. OpenClaw keeps one shared `message` tool in core. Your plugin owns: -- **Config** — account resolution and setup wizard -- **Security** — DM policy and allowlists -- **Pairing** — DM approval flow -- **Session grammar** — how provider-specific conversation ids map to base chats, thread ids, and parent fallbacks -- **Outbound** — sending text, media, and polls to the platform -- **Threading** — how replies are threaded -- **Heartbeat typing** — optional typing/busy signals for heartbeat delivery targets +- **Config** - account resolution and setup wizard +- **Security** - DM policy and allowlists +- **Pairing** - DM approval flow +- **Session grammar** - how provider-specific conversation ids map to base chats, thread ids, and parent fallbacks +- **Outbound** - sending text, media, and polls to the platform +- **Threading** - how replies are threaded +- **Heartbeat typing** - optional typing/busy signals for heartbeat delivery targets Core owns the shared message tool, prompt wiring, the outer session-key shape, generic `:thread:` bookkeeping, and dispatch. @@ -145,11 +145,11 @@ Most channel plugins do not need approval-specific code. - If a channel needs native approval delivery, keep channel code focused on target normalization plus transport/presentation facts. Use `createChannelExecApprovalProfile`, `createChannelNativeOriginTargetResolver`, `createChannelApproverDmTargetResolver`, and `createApproverRestrictedNativeApprovalCapability` from `openclaw/plugin-sdk/approval-runtime`. Put the channel-specific facts behind `approvalCapability.nativeRuntime`, ideally via `createChannelApprovalNativeRuntimeAdapter(...)` or `createLazyChannelApprovalNativeRuntimeAdapter(...)`, so core can assemble the handler and own request filtering, routing, dedupe, expiry, gateway subscription, and routed-elsewhere notices. `nativeRuntime` is split into a few smaller seams: - `createChannelNativeOriginTargetResolver` uses the shared channel-route matcher by default for `{ to, accountId, threadId }` targets. Pass `targetsMatch` only when a channel has provider-specific equivalence rules, such as Slack timestamp prefix matching. - Pass `normalizeTargetForMatch` to `createChannelNativeOriginTargetResolver` when the channel needs to canonicalize provider ids before the default route matcher or a custom `targetsMatch` callback runs, while preserving the original target for delivery. Use `normalizeTarget` only when the resolved delivery target itself should be canonicalized. -- `availability` — whether the account is configured and whether a request should be handled -- `presentation` — map the shared approval view model into pending/resolved/expired native payloads or final actions -- `transport` — prepare targets plus send/update/delete native approval messages -- `interactions` — optional bind/unbind/clear-action hooks for native buttons or reactions -- `observe` — optional delivery diagnostics hooks +- `availability` - whether the account is configured and whether a request should be handled +- `presentation` - map the shared approval view model into pending/resolved/expired native payloads or final actions +- `transport` - prepare targets plus send/update/delete native approval messages +- `interactions` - optional bind/unbind/clear-action hooks for native buttons or reactions +- `observe` - optional delivery diagnostics hooks - If the channel needs runtime-owned objects such as a client, token, Bolt app, or webhook receiver, register them through `openclaw/plugin-sdk/channel-runtime-context`. The generic runtime-context registry lets core bootstrap capability-driven handlers from channel startup state without adding approval-specific wrapper glue. - Reach for the lower-level `createChannelApprovalHandler` or `createChannelNativeApprovalRuntime` only when the capability-driven seam is not expressive enough yet. - Native approval channels must route both `accountId` and `approvalKind` through those helpers. `accountId` keeps multi-account approval policy scoped to the right bot account, and `approvalKind` keeps exec vs plugin approval behavior available to the channel without hardcoded branches in core. @@ -424,7 +424,7 @@ should use `resolveInboundMentionDecision({ facts, policy })`. The `ChannelPlugin` interface has many optional adapter surfaces. Start with - the minimum — `id` and `setup` — and add adapters as you need them. + the minimum - `id` and `setup` - and add adapters as you need them. Create `src/channel.ts`: @@ -631,7 +631,7 @@ should use `resolveInboundMentionDecision({ facts, policy })`. const event = parseWebhookPayload(req); // Your inbound handler dispatches the message to OpenClaw. - // The exact wiring depends on your platform SDK — + // The exact wiring depends on your platform SDK - // see a real example in the bundled Microsoft Teams or Google Chat plugin package. await handleAcmeChatInbound(api, event); @@ -742,10 +742,10 @@ surface unless you are maintaining that bundled plugin family directly. ## Next steps -- [Provider Plugins](/plugins/sdk-provider-plugins) — if your plugin also provides models -- [SDK Overview](/plugins/sdk-overview) — full subpath import reference -- [SDK Testing](/plugins/sdk-testing) — test utilities and contract tests -- [Plugin Manifest](/plugins/manifest) — full manifest schema +- [Provider Plugins](/plugins/sdk-provider-plugins) - if your plugin also provides models +- [SDK Overview](/plugins/sdk-overview) - full subpath import reference +- [SDK Testing](/plugins/sdk-testing) - test utilities and contract tests +- [Plugin Manifest](/plugins/manifest) - full manifest schema ## Related diff --git a/docs/plugins/sdk-migration.md b/docs/plugins/sdk-migration.md index 8ddb9f322a8..7f68c6185ef 100644 --- a/docs/plugins/sdk-migration.md +++ b/docs/plugins/sdk-migration.md @@ -19,18 +19,18 @@ the new architecture, this guide helps you migrate. The old plugin system provided two wide-open surfaces that let plugins import anything they needed from a single entry point: -- **`openclaw/plugin-sdk/compat`** — a single import that re-exported dozens of +- **`openclaw/plugin-sdk/compat`** - a single import that re-exported dozens of helpers. It was introduced to keep older hook-based plugins working while the new plugin architecture was being built. -- **`openclaw/plugin-sdk/infra-runtime`** — a broad runtime helper barrel that +- **`openclaw/plugin-sdk/infra-runtime`** - a broad runtime helper barrel that mixed system events, heartbeat state, delivery queues, fetch/proxy helpers, file helpers, approval types, and unrelated utilities. -- **`openclaw/plugin-sdk/config-runtime`** — a broad config compatibility barrel +- **`openclaw/plugin-sdk/config-runtime`** - a broad config compatibility barrel that still carries deprecated direct load/write helpers during the migration window. -- **`openclaw/extension-api`** — a bridge that gave plugins direct access to +- **`openclaw/extension-api`** - a bridge that gave plugins direct access to host-side helpers like the embedded agent runner. -- **`api.registerEmbeddedExtensionFactory(...)`** — a removed Pi-only bundled +- **`api.registerEmbeddedExtensionFactory(...)`** - a removed Pi-only bundled extension hook that could observe embedded-runner events such as `tool_result`. @@ -55,9 +55,9 @@ registration behavior. The old approach caused problems: -- **Slow startup** — importing one helper loaded dozens of unrelated modules -- **Circular dependencies** — broad re-exports made it easy to create import cycles -- **Unclear API surface** — no way to tell which exports were stable vs internal +- **Slow startup** - importing one helper loaded dozens of unrelated modules +- **Circular dependencies** - broad re-exports made it easy to create import cycles +- **Unclear API surface** - no way to tell which exports were stable vs internal The modern plugin SDK fixes this: each import path (`openclaw/plugin-sdk/\`) is a small, self-contained module with a clear purpose and documented contract. @@ -679,7 +679,7 @@ canonical replacement. `buildCommandsMessagePaginated`, `buildHelpMessage`. **New (`openclaw/plugin-sdk/command-status`)**: same signatures, same - exports — just imported from the narrower subpath. `command-auth` + exports - just imported from the narrower subpath. `command-auth` re-exports them as compat stubs. ```typescript @@ -698,7 +698,7 @@ canonical replacement. `openclaw/plugin-sdk/channel-inbound` or `openclaw/plugin-sdk/channel-mention-gating`. - **New**: `resolveInboundMentionDecision({ facts, policy })` — returns a + **New**: `resolveInboundMentionDecision({ facts, policy })` - returns a single decision object instead of two split calls. Downstream channel plugins (Slack, Discord, Matrix, MS Teams) have already @@ -714,7 +714,7 @@ canonical replacement. `channelActions*` helpers in `openclaw/plugin-sdk/channel-actions` are deprecated alongside raw "actions" channel exports. Expose capabilities - through the semantic `presentation` surface instead — channel plugins + through the semantic `presentation` surface instead - channel plugins declare what they render (cards, buttons, selects) rather than which raw action names they accept. @@ -756,7 +756,7 @@ canonical replacement. | `ProviderDiscoveryResult` | `ProviderCatalogResult` | | `ProviderPluginDiscovery` | `ProviderPluginCatalog` | - Plus the legacy `ProviderCapabilities` static bag — provider plugins + Plus the legacy `ProviderCapabilities` static bag - provider plugins should use explicit provider hooks such as `buildReplayPolicy`, `normalizeToolSchemas`, and `wrapStreamFn` rather than a static object. @@ -809,12 +809,12 @@ canonical replacement. - **Old**: three separate calls — + **Old**: three separate calls - `api.registerMemoryPromptSection(...)`, `api.registerMemoryFlushPlan(...)`, `api.registerMemoryRuntime(...)`. - **New**: one call on the memory-state API — + **New**: one call on the memory-state API - `registerMemoryCapability(pluginId, { promptBuilder, flushPlanResolver, runtime })`. Same slots, single registration call. Additive memory helpers @@ -906,9 +906,9 @@ This is a temporary escape hatch, not a permanent solution. ## Related -- [Getting Started](/plugins/building-plugins) — build your first plugin -- [SDK Overview](/plugins/sdk-overview) — full subpath import reference -- [Channel Plugins](/plugins/sdk-channel-plugins) — building channel plugins -- [Provider Plugins](/plugins/sdk-provider-plugins) — building provider plugins -- [Plugin Internals](/plugins/architecture) — architecture deep dive -- [Plugin Manifest](/plugins/manifest) — manifest schema reference +- [Getting Started](/plugins/building-plugins) - build your first plugin +- [SDK Overview](/plugins/sdk-overview) - full subpath import reference +- [Channel Plugins](/plugins/sdk-channel-plugins) - building channel plugins +- [Provider Plugins](/plugins/sdk-provider-plugins) - building provider plugins +- [Plugin Internals](/plugins/architecture) - architecture deep dive +- [Plugin Manifest](/plugins/manifest) - manifest schema reference