Files
openclaw/docs/concepts/streaming.md
2026-03-02 23:04:12 +00:00

156 lines
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Streaming + chunking behavior (block replies, channel preview streaming, mode mapping)"
read_when:
- Explaining how streaming or chunking works on channels
- Changing block streaming or channel chunking behavior
- Debugging duplicate/early block replies or channel preview streaming
title: "Streaming and Chunking"
---
# Streaming + chunking
OpenClaw has two separate streaming layers:
- **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).
- **Preview streaming (Telegram/Discord/Slack):** update a temporary **preview message** while generating.
There is **no true token-delta streaming** to channel messages today. Preview streaming is message-based (send + edits/appends).
## Block streaming (channel messages)
Block streaming sends assistant output in coarse chunks as it becomes available.
```
Model output
└─ text_delta/events
├─ (blockStreamingBreak=text_end)
│ └─ chunker emits blocks as buffer grows
└─ (blockStreamingBreak=message_end)
└─ chunker flushes at message_end
└─ channel send (block replies)
```
Legend:
- `text_delta/events`: model stream events (may be sparse for non-streaming models).
- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
- `channel send`: actual outbound messages (block replies).
**Controls:**
- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
- Channel overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per channel.
- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
- `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).
- Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).
- Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).
- Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.
**Boundary semantics:**
- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
- `message_end`: wait until assistant message finishes, then flush buffered output.
`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
## Chunking algorithm (low/high bounds)
Block chunking is implemented by `EmbeddedBlockChunker`:
- **Low bound:** dont emit until buffer >= `minChars` (unless forced).
- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
- **Break preference:** `paragraph``newline``sentence``whitespace` → hard break.
- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.
`maxChars` is clamped to the channel `textChunkLimit`, so you cant exceed per-channel caps.
## Coalescing (merge streamed blocks)
When block streaming is enabled, OpenClaw can **merge consecutive block chunks**
before sending them out. This reduces “single-line spam” while still providing
progressive output.
- Coalescing waits for **idle gaps** (`idleMs`) before flushing.
- Buffers are capped by `maxChars` and will flush if they exceed it.
- `minChars` prevents tiny fragments from sending until enough text accumulates
(final flush always sends remaining text).
- Joiner is derived from `blockStreamingChunk.breakPreference`
(`paragraph``\n\n`, `newline``\n`, `sentence` → space).
- Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.
## Human-like pacing between blocks
When block streaming is enabled, you can add a **randomized pause** between
block replies (after the first block). This makes multi-bubble responses feel
more natural.
- Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).
- Modes: `off` (default), `natural` (8002500ms), `custom` (`minMs`/`maxMs`).
- Applies only to **block replies**, not final replies or tool summaries.
## “Stream chunks or everything”
This maps to:
- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.
- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
**Channel note:** Block streaming is **off unless**
`*.blockStreaming` is explicitly set to `true`. Channels can stream a live preview
(`channels.<channel>.streaming`) without block replies.
Config location reminder: the `blockStreaming*` defaults live under
`agents.defaults`, not the root config.
## Preview streaming modes
Canonical key: `channels.<channel>.streaming`
Modes:
- `off`: disable preview streaming.
- `partial`: single preview that is replaced with latest text.
- `block`: preview updates in chunked/appended steps.
- `progress`: progress/status preview during generation, final answer at completion.
### Channel mapping
| Channel | `off` | `partial` | `block` | `progress` |
| -------- | ----- | --------- | ------- | ----------------- |
| Telegram | ✅ | ✅ | ✅ | maps to `partial` |
| Discord | ✅ | ✅ | ✅ | maps to `partial` |
| Slack | ✅ | ✅ | ✅ | ✅ |
Slack-only:
- `channels.slack.nativeStreaming` toggles Slack native streaming API calls when `streaming=partial` (default: `true`).
Legacy key migration:
- Telegram: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
- Discord: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
- Slack: `streamMode` auto-migrates to `streaming` enum; boolean `streaming` auto-migrates to `nativeStreaming`.
### Runtime behavior
Telegram:
- Uses Bot API `sendMessageDraft` in DMs when available, and `sendMessage` + `editMessageText` for group/topic preview updates.
- Preview streaming is skipped when Telegram block streaming is explicitly enabled (to avoid double-streaming).
- `/reasoning stream` can write reasoning to preview.
Discord:
- Uses send + edit preview messages.
- `block` mode uses draft chunking (`draftChunk`).
- Preview streaming is skipped when Discord block streaming is explicitly enabled.
Slack:
- `partial` can use Slack native streaming (`chat.startStream`/`append`/`stop`) when available.
- `block` uses append-style draft previews.
- `progress` uses status preview text, then final answer.