Files
openclaw/docs/web/webchat.md
Peter Steinberger 68359cacbf feat(webchat): add server-side dictation (#76021)
Summary:
- This PR adds WebChat server-side dictation through a new authenticated `chat.transcribeAudio` Gateway RPC, MediaRecorder composer controls, docs/changelog updates, and focused gateway/UI tests.
- Reproducibility: yes. Current main reproduces the missing feature by inspection: the Gateway method list, write scopes, docs, and WebChat voice-control test have no `chat.transcribeAudio` server-dictation path.

ClawSweeper fixups:
- Included follow-up commit: feat(webchat): add server-side dictation
- Included follow-up commit: fix(clawsweeper): address review for automerge-openclaw-openclaw-7602…

Validation:
- ClawSweeper review passed for head 850571380a.
- Required merge gates passed before the squash merge.

Prepared head SHA: 850571380a
Review: https://github.com/openclaw/openclaw/pull/76021#issuecomment-4363514226

Co-authored-by: Peter Steinberger <steipete@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
2026-05-02 23:09:23 +00:00

5.2 KiB

summary, read_when, title
summary read_when title
Loopback WebChat static host and Gateway WS usage for chat UI
Debugging or configuring WebChat access
WebChat

Status: the macOS/iOS SwiftUI chat UI talks directly to the Gateway WebSocket.

What it is

  • A native chat UI for the gateway (no embedded browser and no local static server).
  • Uses the same sessions and routing rules as other channels.
  • Deterministic routing: replies always go back to WebChat.

Quick start

  1. Start the gateway.
  2. Open the WebChat UI (macOS/iOS app) or the Control UI chat tab.
  3. Ensure a valid gateway auth path is configured (shared-secret by default, even on loopback).

How it works (behavior)

  • The UI connects to the Gateway WebSocket and uses chat.history, chat.send, chat.inject, and chat.transcribeAudio.
  • chat.history is bounded for stability: Gateway may truncate long text fields, omit heavy metadata, and replace oversized entries with [chat.history omitted: message too large].
  • chat.history follows the active transcript branch for modern append-only session files, so abandoned rewrite branches and superseded prompt copies are not rendered in WebChat.
  • Control UI remembers the backing Gateway sessionId returned by chat.history and includes it on follow-up chat.send calls, so reconnects and page refreshes continue the same stored conversation unless the user starts or resets a session.
  • Control UI coalesces duplicate in-flight submits for the same session, message, and attachments before generating a new chat.send run id; the Gateway still dedupes repeated requests that reuse the same idempotency key.
  • chat.history is also display-normalized: runtime-only OpenClaw context, inbound envelope wrappers, inline delivery directive tags such as [[reply_to_*]] and [[audio_as_voice]], plain-text tool-call XML payloads (including <tool_call>...</tool_call>, <function_call>...</function_call>, <tool_calls>...</tool_calls>, <function_calls>...</function_calls>, and truncated tool-call blocks), and leaked ASCII/full-width model control tokens are stripped from visible text, and assistant entries whose whole visible text is only the exact silent token NO_REPLY / no_reply are omitted.
  • Reasoning-flagged reply payloads (isReasoning: true) are excluded from WebChat assistant content, transcript replay text, and audio content blocks, so thinking-only payloads do not surface as visible assistant messages or playable audio.
  • chat.transcribeAudio powers server-side dictation in the Control UI chat composer. The browser records microphone audio, sends it as base64 to the Gateway, and the Gateway runs the configured tools.media.audio pipeline. The returned transcript is inserted into the draft; no agent run is started until the user sends it.
  • chat.inject appends an assistant note directly to the transcript and broadcasts it to the UI (no agent run).
  • Aborted runs can keep partial assistant output visible in the UI.
  • Gateway persists aborted partial assistant text into transcript history when buffered output exists, and marks those entries with abort metadata.
  • History is always fetched from the gateway (no local file watching).
  • If the gateway is unreachable, WebChat is read-only.

Control UI agents tools panel

  • The Control UI /agents Tools panel has two separate views:
    • Available Right Now uses tools.effective(sessionKey=...) and shows what the current session can actually use at runtime, including core, plugin, and channel-owned tools.
    • Tool Configuration uses tools.catalog and stays focused on profiles, overrides, and catalog semantics.
  • Runtime availability is session-scoped. Switching sessions on the same agent can change the Available Right Now list.
  • The config editor does not imply runtime availability; effective access still follows policy precedence (allow/deny, per-agent and provider/channel overrides).

Remote use

  • Remote mode tunnels the gateway WebSocket over SSH/Tailscale.
  • You do not need to run a separate WebChat server.

Configuration reference (WebChat)

Full configuration: Configuration

WebChat options:

  • gateway.webchat.chatHistoryMaxChars: maximum character count for text fields in chat.history responses. When a transcript entry exceeds this limit, Gateway truncates long text fields and may replace oversized messages with a placeholder. Per-request maxChars can also be sent by the client to override this default for a single chat.history call.

Related global options:

  • gateway.port, gateway.bind: WebSocket host/port.
  • gateway.auth.mode, gateway.auth.token, gateway.auth.password: shared-secret WebSocket auth.
  • gateway.auth.allowTailscale: browser Control UI chat tab can use Tailscale Serve identity headers when enabled.
  • gateway.auth.mode: "trusted-proxy": reverse-proxy auth for browser clients behind an identity-aware non-loopback proxy source (see Trusted Proxy Auth).
  • gateway.remote.url, gateway.remote.token, gateway.remote.password: remote gateway target.
  • session.*: session storage and main key defaults.