openclaw/docs/concepts/queue.md at c5ea7c4d0fa4a26439a9e95739624e7b35930a45

vultr/openclaw

Fork 0

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-06 04:50:44 +00:00

Files

Vincent Koc c5ea7c4d0f docs: typography hygiene across 6 pages

2026-05-05 21:04:19 -07:00

7.0 KiB

Raw Blame History

summary, read_when, title

summary

read_when

title

Auto-reply queue modes, defaults, and per-session overrides

Changing auto-reply execution or concurrency

Explaining /queue modes or message steering behavior

Command queue

We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.

Why

Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.

How it works

A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
runEmbeddedPiAgent enqueues by session key (lane session:<key>) to guarantee only one active run per session.
Each session run is then queued into a global lane (main by default) so overall parallelism is capped by agents.defaults.maxConcurrent.
When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.

Defaults

When unset, all inbound channel surfaces use:

mode: "steer"
debounceMs: 500
cap: 20
drop: "summarize"

steer is the default because it keeps the active model turn responsive without starting a second session run. It drains all steering messages that arrived before the next model boundary. If the current run cannot accept steering, OpenClaw falls back to a followup queue entry.

Queue modes

Inbound messages can steer the current run, wait for a followup turn, or do both:

steer: queue steering messages into the active runtime. Pi delivers all pending steering messages after the current assistant turn finishes executing its tool calls, before the next LLM call; Codex app-server receives one batched turn/steer. If the run is not actively streaming or steering is unavailable, OpenClaw falls back to a followup queue entry.
queue (legacy): old one-at-a-time steering. Pi delivers one queued steering message at each model boundary; Codex app-server receives separate turn/steer requests. Prefer steer unless you need the previous serialized behavior.
followup: enqueue each message for a later agent turn after the current run ends.
collect: coalesce queued messages into a single followup turn after the quiet window. If messages target different channels/threads, they drain individually to preserve routing.
steer-backlog (aka steer+backlog): steer now and preserve the same message for a followup turn.
interrupt (legacy): abort the active run for that session, then run the newest message.

Steer-backlog means you can get a followup response after the steered run, so streaming surfaces can look like duplicates. Prefer collect/steer if you want one response per inbound message.

For runtime-specific timing and dependency behavior, see Steering queue. For the explicit /steer <message> command, see Steer.

Configure globally or per channel via messages.queue:

{
  messages: {
    queue: {
      mode: "steer",
      debounceMs: 500,
      cap: 20,
      drop: "summarize",
      byChannel: { discord: "collect" },
    },
  },
}

Queue options

Options apply to followup, collect, and steer-backlog (and to steer or legacy queue when steering falls back to followup):

debounceMs: quiet window before draining queued followups. Bare numbers are milliseconds; units ms, s, m, h, and d are accepted by /queue options.
cap: max queued messages per session. Values below 1 are ignored.
drop: "summarize": default. Drop the oldest queued entries as needed, keep compact summaries, and inject them as a synthetic followup prompt.
drop: "old": drop the oldest queued entries as needed, without preserving summaries.
drop: "new": reject the newest message when the queue is already full.

Defaults: debounceMs: 500, cap: 20, drop: summarize.

Precedence

For mode selection, OpenClaw resolves:

Inline or stored per-session /queue override.
messages.queue.byChannel.<channel>.
messages.queue.mode.
Default steer.

For options, inline or stored /queue options win over config. Then channel-specific debounce (messages.queue.debounceMsByChannel), plugin debounce defaults, global messages.queue options, and built-in defaults are applied. cap and drop are global/session options, not per-channel config keys.

Per-session overrides

Send /queue <mode> as a standalone command to store the mode for the current session.
Options can be combined: /queue collect debounce:0.5s cap:25 drop:summarize
/queue default or /queue reset clears the session override.

Scope and guarantees

Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
Default lane (main) is process-wide for inbound + main heartbeats; set agents.defaults.maxConcurrent to allow multiple sessions in parallel.
Additional lanes may exist (e.g. cron, cron-nested, nested, subagent) so background jobs can run in parallel without blocking inbound replies. Isolated cron agent turns hold a cron slot while their inner agent execution uses cron-nested; both use cron.maxConcurrentRuns. Shared non-cron nested flows keep their own lane behavior. These detached runs are tracked as background tasks.
Per-session lanes guarantee that only one agent run touches a given session at a time.
No external dependencies or background worker threads; pure TypeScript + promises.

Troubleshooting

If commands seem stuck, enable verbose logs and look for "queued for ...ms" lines to confirm the queue is draining.
If you need queue depth, enable verbose logs and watch for queue timing lines.
Codex app-server runs that accept a turn and then stop emitting progress are interrupted by the Codex adapter so the active session lane can release instead of waiting for the outer run timeout.
When diagnostics are enabled, sessions that remain in processing past diagnostics.stuckSessionWarnMs with no observed reply, tool, status, block, or ACP progress are classified by current activity. Active work logs as session.long_running; active work with no recent progress logs as session.stalled; session.stuck is reserved for stale session bookkeeping with no active work, and only that path can release the affected session lane so queued work drains. Repeated session.stuck diagnostics back off while the session remains unchanged.

7.0 KiB Raw Blame History

Why