Files
openclaw/docs/concepts/queue.md
Peter Steinberger bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341)
* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00

7.2 KiB

summary, read_when, title
summary read_when title
Auto-reply queue modes, defaults, and per-session overrides
Changing auto-reply execution or concurrency
Explaining /queue modes or message steering behavior
Command queue

We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.

Why

  • Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
  • Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.

How it works

  • A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
  • runEmbeddedAgent enqueues by session key (lane session:<key>) to guarantee only one active run per session.
  • Each session run is then queued into a global lane (main by default) so overall parallelism is capped by agents.defaults.maxConcurrent.
  • When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
  • Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.

Defaults

When unset, all inbound channel surfaces use:

  • mode: "steer"
  • debounceMs: 500
  • cap: 20
  • drop: "summarize"

Same-turn steering is the default. A prompt that arrives mid-run is injected into the active runtime when the run can accept steering, so no second session run is started. If the active run cannot accept steering, OpenClaw waits for the active run to finish before starting the prompt.

Queue modes

/queue controls what normal inbound messages do while a session already has an active run:

  • steer: inject messages into the active runtime. OpenClaw delivers all pending steering messages after the current assistant turn finishes executing its tool calls, before the next LLM call; Codex app-server receives one batched turn/steer. If the run is not actively streaming or steering is unavailable, OpenClaw waits until the active run ends before starting the prompt.
  • followup: do not steer. Enqueue each message for a later agent turn after the current run ends.
  • collect: do not steer. Coalesce queued messages into a single followup turn after the quiet window. If messages target different channels/threads, they drain individually to preserve routing.
  • interrupt: abort the active run for that session, then run the newest message.

For runtime-specific timing and dependency behavior, see Steering queue. For the explicit /steer <message> command, see Steer.

Configure globally or per channel via messages.queue:

{
  messages: {
    queue: {
      mode: "steer",
      debounceMs: 500,
      cap: 20,
      drop: "summarize",
      byChannel: { discord: "collect" },
    },
  },
}

Queue options

Options apply to queued delivery. debounceMs also sets the Codex steering quiet window in steer mode:

  • debounceMs: quiet window before draining queued followups or collect batches; in Codex steer mode, quiet window before sending batched turn/steer. Bare numbers are milliseconds; units ms, s, m, h, and d are accepted by /queue options.
  • cap: max queued messages per session. Values below 1 are ignored.
  • drop: "summarize": default. Drop the oldest queued entries as needed, keep compact summaries, and inject them as a synthetic followup prompt.
  • drop: "old": drop the oldest queued entries as needed, without preserving summaries.
  • drop: "new": reject the newest message when the queue is already full.

Defaults: debounceMs: 500, cap: 20, drop: summarize.

Steer and streaming

When channel streaming is partial or block, steering can look like several short visible replies while the active run reaches runtime boundaries:

  • partial: the preview may finalize early, then a new preview starts after steering is accepted.
  • block: draft-sized blocks can create the same sequential appearance.
  • Without streaming, steering falls back to a followup after the active run when the runtime cannot accept same-turn steering.

steer does not abort in-flight tools. Use /queue interrupt when the newest message should abort the current run.

Precedence

For mode selection, OpenClaw resolves:

  1. Inline or stored per-session /queue override.
  2. messages.queue.byChannel.<channel>.
  3. messages.queue.mode.
  4. Default steer.

For options, inline or stored /queue options win over config. Then channel-specific debounce (messages.queue.debounceMsByChannel), plugin debounce defaults, global messages.queue options, and built-in defaults are applied. cap and drop are global/session options, not per-channel config keys.

Per-session overrides

  • Send /queue <steer|followup|collect|interrupt> as a standalone command to store the queue mode for the current session.
  • Options can be combined: /queue collect debounce:0.5s cap:25 drop:summarize
  • /queue default or /queue reset clears the session override.

Scope and guarantees

  • Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
  • Default lane (main) is process-wide for inbound + main heartbeats; set agents.defaults.maxConcurrent to allow multiple sessions in parallel.
  • Additional lanes may exist (e.g. cron, cron-nested, nested, subagent) so background jobs can run in parallel without blocking inbound replies. Isolated cron agent turns hold a cron slot while their inner agent execution uses cron-nested; both use cron.maxConcurrentRuns. Shared non-cron nested flows keep their own lane behavior. These detached runs are tracked as background tasks.
  • Per-session lanes guarantee that only one agent run touches a given session at a time.
  • No external dependencies or background worker threads; pure TypeScript + promises.

Troubleshooting

  • If commands seem stuck, enable verbose logs and look for "queued for ...ms" lines to confirm the queue is draining.
  • If you need queue depth, enable verbose logs and watch for queue timing lines.
  • Codex app-server runs that accept a turn and then stop emitting progress are interrupted by the Codex adapter so the active session lane can release instead of waiting for the outer run timeout.
  • When diagnostics are enabled, sessions that remain in processing past diagnostics.stuckSessionWarnMs with no observed reply, tool, status, block, or ACP progress are classified by current activity. Active work logs as session.long_running; active work with no recent progress logs as session.stalled; session.stuck is reserved for recoverable stale session bookkeeping, including idle queued sessions with stale ownerless model/tool activity, and only that path can release the affected session lane so queued work drains. Repeated session.stuck diagnostics back off while the session remains unchanged.