fix: reduce WebUI session latency churn (#76277) thanks @BunsDev

Reduce WebUI/Gateway latency churn by avoiding redundant session reloads, carrying session keys through transcript update events, and deferring explicit media provider discovery. Includes changelog attribution and closes the referenced runtime latency issues.
2026-05-06 16:20:43 +00:00 · 2026-05-02 18:39:06 -05:00
parent 1c4d3e2f4f
commit 05c9492bff
71 changed files with 767 additions and 1730 deletions
--- a/docs/nodes/audio.md
+++ b/docs/nodes/audio.md
@@ -17,7 +17,6 @@ title: "Audio and voice notes"
  5. On success, it replaces `Body` with an `[Audio]` block and sets `{{Transcript}}`.
 - **Command parsing**: When transcription succeeds, `CommandBody`/`RawBody` are set to the transcript so slash commands still work.
 - **Verbose logging**: In `--verbose`, we log when transcription runs and when it replaces the body.
- **Control UI dictation**: The Chat composer can send a browser-recorded microphone clip to `chat.transcribeAudio`. That Gateway RPC writes the clip to a temporary local file, runs this same audio transcription pipeline, returns draft text to the browser, and deletes the temporary file. It does not create an agent run by itself.

 ## Auto-detection (default)

--- a/docs/providers/arcee.md
+++ b/docs/providers/arcee.md
@@ -98,24 +98,24 @@ Arcee AI models can be accessed directly via the Arcee platform or through [Open

 OpenClaw currently ships this bundled Arcee catalog:

-| Model ref                      | Name                   | Input | Context | Cost (in/out per 1M) | Notes                                      |
-| ------------------------------ | ---------------------- | ----- | ------- | -------------------- | ------------------------------------------ |
-| `arcee/trinity-large-thinking` | Trinity Large Thinking | text  | 256K    | $0.25 / $0.90        | Default model; reasoning enabled; no tools |
-| `arcee/trinity-large-preview`  | Trinity Large Preview  | text  | 128K    | $0.25 / $1.00        | General-purpose; 400B params, 13B active   |
-| `arcee/trinity-mini`           | Trinity Mini 26B       | text  | 128K    | $0.045 / $0.15       | Fast and cost-efficient; function calling  |
+| Model ref                      | Name                   | Input | Context | Cost (in/out per 1M) | Notes                                     |
+| ------------------------------ | ---------------------- | ----- | ------- | -------------------- | ----------------------------------------- |
+| `arcee/trinity-large-thinking` | Trinity Large Thinking | text  | 256K    | $0.25 / $0.90        | Default model; reasoning enabled          |
+| `arcee/trinity-large-preview`  | Trinity Large Preview  | text  | 128K    | $0.25 / $1.00        | General-purpose; 400B params, 13B active  |
+| `arcee/trinity-mini`           | Trinity Mini 26B       | text  | 128K    | $0.045 / $0.15       | Fast and cost-efficient; function calling |

 <Tip>
-The onboarding preset sets `arcee/trinity-large-thinking` as the default model. It is reasoning/text-only and does not support tool use or function calling.
+The onboarding preset sets `arcee/trinity-large-thinking` as the default model.
 </Tip>

 ## Supported features

-| Feature                                       | Supported                                   |
-| --------------------------------------------- | ------------------------------------------- |
-| Streaming                                     | Yes                                         |
-| Tool use / function calling                   | Model-dependent; not Trinity Large Thinking |
-| Structured output (JSON mode and JSON schema) | Yes                                         |
-| Extended thinking                             | Yes (Trinity Large Thinking)                |
+| Feature                                       | Supported                    |
+| --------------------------------------------- | ---------------------------- |
+| Streaming                                     | Yes                          |
+| Tool use / function calling                   | Yes                          |
+| Structured output (JSON mode and JSON schema) | Yes                          |
+| Extended thinking                             | Yes (Trinity Large Thinking) |

 <AccordionGroup>
  <Accordion title="Environment note">
--- a/docs/web/control-ui.md
+++ b/docs/web/control-ui.md
@@ -96,7 +96,6 @@ Imported themes are stored only in the current browser profile. They are not wri
 <AccordionGroup>
  <Accordion title="Chat and Talk">
    - Chat with the model via Gateway WS (`chat.history`, `chat.send`, `chat.abort`, `chat.inject`).
-    - Dictate into the Chat composer with server-side STT (`chat.transcribeAudio`). The browser records a short microphone clip and sends it to the Gateway, which runs the configured `tools.media.audio` transcription pipeline and returns draft text without exposing provider credentials to the browser.
    - Talk through browser realtime sessions. OpenAI uses direct WebRTC, Google Live uses a constrained one-use browser token over WebSocket, and backend-only realtime voice plugins use the Gateway relay transport. The relay keeps provider credentials on the Gateway while the browser streams microphone PCM through `talk.realtime.relay*` RPCs and sends `openclaw_agent_consult` tool calls back through `chat.send` for the larger configured OpenClaw model.
    - Stream tool calls + live tool output cards in Chat (agent events).

@@ -150,7 +149,6 @@ Imported themes are stored only in the current browser profile. They are not wri
 <AccordionGroup>
  <Accordion title="Send and history semantics">
    - `chat.send` is **non-blocking**: it acks immediately with `{ runId, status: "started" }` and the response streams via `chat` events.
-    - `chat.transcribeAudio` is a one-shot dictation helper for Chat drafts. It accepts browser-recorded base64 audio, keeps uploads below the Gateway WebSocket frame limit, writes a temporary local file, runs media-understanding audio transcription with the active Gateway config, returns `{ text, provider, model }`, and removes the temporary file. It does not create an agent run and is separate from realtime Talk.
    - Chat uploads accept images plus non-video files. Images keep the native image path; other files are stored as managed media and shown in history as attachment links.
    - Re-sending with the same `idempotencyKey` returns `{ status: "in_flight" }` while running, and `{ status: "ok" }` after completion.
    - `chat.history` responses are size-bounded for UI safety. When transcript entries are too large, Gateway may truncate long text fields, omit heavy metadata blocks, and replace oversized messages with a placeholder (`[chat.history omitted: message too large]`).
--- a/docs/web/webchat.md
+++ b/docs/web/webchat.md
@@ -22,7 +22,7 @@ Status: the macOS/iOS SwiftUI chat UI talks directly to the Gateway WebSocket.

 ## How it works (behavior)

- The UI connects to the Gateway WebSocket and uses `chat.history`, `chat.send`, `chat.inject`, and `chat.transcribeAudio`.
+- The UI connects to the Gateway WebSocket and uses `chat.history`, `chat.send`, and `chat.inject`.
 - `chat.history` is bounded for stability: Gateway may truncate long text fields, omit heavy metadata, and replace oversized entries with `[chat.history omitted: message too large]`.
 - `chat.history` follows the active transcript branch for modern append-only session files, so abandoned rewrite branches and superseded prompt copies are not rendered in WebChat.
 - Control UI remembers the backing Gateway `sessionId` returned by `chat.history` and includes it on follow-up `chat.send` calls, so reconnects and page refreshes continue the same stored conversation unless the user starts or resets a session.
@@ -37,7 +37,6 @@ Status: the macOS/iOS SwiftUI chat UI talks directly to the Gateway WebSocket.
  and assistant entries whose whole visible text is only the exact silent
  token `NO_REPLY` / `no_reply` are omitted.
 - Reasoning-flagged reply payloads (`isReasoning: true`) are excluded from WebChat assistant content, transcript replay text, and audio content blocks, so thinking-only payloads do not surface as visible assistant messages or playable audio.
- `chat.transcribeAudio` powers server-side dictation in the Control UI chat composer. The browser records microphone audio, sends it as base64 to the Gateway, and the Gateway runs the configured `tools.media.audio` pipeline. The returned transcript is inserted into the draft; no agent run is started until the user sends it.
 - `chat.inject` appends an assistant note directly to the transcript and broadcasts it to the UI (no agent run).
 - Aborted runs can keep partial assistant output visible in the UI.
 - Gateway persists aborted partial assistant text into transcript history when buffered output exists, and marks those entries with abort metadata.