* fix(gateway): increase WebSocket max payload to 5 MB for image uploads
The 512 KB limit was too small for base64-encoded images — a 400 KB
image becomes ~532 KB after encoding, exceeding the limit and closing
the connection with code 1006.
Bump MAX_PAYLOAD_BYTES to 5 MB and MAX_BUFFERED_BYTES to 8 MB to
support standard image uploads via webchat.
Closes#14400
* fix: align gateway WS limits with 5MB image uploads (#14486) (thanks @0xRaini)
* docs: fix changelog conflict for #14486
---------
Co-authored-by: 0xRaini <0xRaini@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(cli): exit with non-zero code when configure/agents-add wizards are cancelled
Follow-up to the onboard cancel fix. The configure wizard and
agents add wizard also caught WizardCancelledError and exited with
code 0, which signals success to callers. Change to exit(1) for
consistency — user cancellation is not a successful completion.
This ensures scripts that chain these commands with set -e will
correctly stop when the user cancels.
* fix(cli): make wizard cancellations exit non-zero (#14156) (thanks @0xRaini)
---------
Co-authored-by: Rain <rain@Rains-MBA-M4.local>
Co-authored-by: Sebastian <19554889+sebslight@users.noreply.github.com>
* feat: add LiteLLM provider types, env var, credentials, and auth choice
Add litellm-api-key auth choice, LITELLM_API_KEY env var mapping,
setLitellmApiKey() credential storage, and LITELLM_DEFAULT_MODEL_REF.
* feat: add LiteLLM onboarding handler and provider config
Add applyLitellmProviderConfig which properly registers
models.providers.litellm with baseUrl, api type, and model definitions.
This fixes the critical bug from PR #6488 where the provider entry was
never created, causing model resolution to fail at runtime.
* docs: add LiteLLM provider documentation
Add setup guide covering onboarding, manual config, virtual keys,
model routing, and usage tracking. Link from provider index.
* docs: add LiteLLM to sidebar navigation in docs.json
Add providers/litellm to both English and Chinese provider page lists
so the docs page appears in the sidebar navigation.
* test: add LiteLLM non-interactive onboarding test
Wire up litellmApiKey flag inference and auth-choice handler for the
non-interactive onboarding path, and add an integration test covering
profile, model default, and credential storage.
* fix: register --litellm-api-key CLI flag and add preferred provider mapping
Wire up the missing Commander CLI option, action handler mapping, and
help text for --litellm-api-key. Add litellm-api-key to the preferred
provider map for consistency with other providers.
* fix: remove zh-CN sidebar entry for litellm (no localized page yet)
* style: format buildLitellmModelDefinition return type
* fix(onboarding): harden LiteLLM provider setup (#12823)
* refactor(onboarding): keep auth-choice provider dispatcher under size limit
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(browser): prevent permanent timeout after stuck evaluate
Thread AbortSignal from client-fetch through dispatcher to Playwright
operations. When a timeout fires, force-disconnect the Playwright CDP
connection to unblock the serialized command queue, allowing the next
call to reconnect transparently.
Key changes:
- client-fetch.ts: proper AbortController with signal propagation
- pw-session.ts: new forceDisconnectPlaywrightForTarget()
- pw-tools-core.interactions.ts: accept signal, align inner timeout
to outer-500ms, inject in-browser Promise.race for async evaluates
- routes/dispatcher.ts + types.ts: propagate signal through dispatch
- server.ts + bridge-server.ts: Express middleware creates AbortSignal
from request lifecycle
- client-actions-core.ts: add timeoutMs to evaluate type
Fixes#10994
* fix(browser): v2 - force-disconnect via Connection.close() instead of browser.close()
When page.evaluate() is stuck on a hung CDP transport, browser.close() also
hangs because it tries to send a close command through the same stuck pipe.
v2 fix: forceDisconnectPlaywrightForTarget now directly calls Playwright's
internal Connection.close() which locally rejects all pending callbacks and
emits 'disconnected' without touching the network. This instantly unblocks
all stuck Playwright operations.
closePlaywrightBrowserConnection (clean shutdown) now also has a 3s timeout
fallback that drops to forceDropConnection if browser.close() hangs.
Fixes permanent browser timeout after stuck evaluate.
* fix(browser): v3 - fire-and-forget browser.close() instead of Connection.close()
v2's forceDropConnection called browser._connection.close() which corrupts
the entire Playwright instance because Connection is shared across all
objects (BrowserType, Browser, Page, etc.). This prevented reconnection
with cascading 'connectOverCDP: Force-disconnected' errors.
v3 fix: forceDisconnectPlaywrightForTarget now:
1. Nulls cached connection immediately
2. Fire-and-forgets browser.close() (doesn't await — it may hang)
3. Next connectBrowser() creates a fresh connectOverCDP WebSocket
Each connectOverCDP creates an independent WebSocket to the CDP endpoint,
so the new connection is unaffected by the old one's pending close.
The old browser.close() eventually resolves when the in-browser evaluate
timeout fires, or the old connection gets GC'd.
* fix(browser): v4 - clear connecting state and remove stale disconnect listeners
The reconnect was failing because:
1. forceDisconnectPlaywrightForTarget nulled cached but not connecting,
so subsequent calls could await a stale promise
2. The old browser's 'disconnected' event handler raced with new
connections, nulling the fresh cached reference
Fix: null both cached and connecting, and removeAllListeners on the
old browser before fire-and-forget close.
* fix(browser): v5 - use raw CDP Runtime.terminateExecution to kill stuck evaluate
When forceDisconnectPlaywrightForTarget fires, open a raw WebSocket
to the stuck page's CDP endpoint and send Runtime.terminateExecution.
This kills running JS without navigating away or crashing the page.
Also clear connecting state and remove stale disconnect listeners.
* fix(browser): abort cancels stuck evaluate
* Browser: always cleanup evaluate abort listener
* Chore: remove Playwright debug scripts
* Docs: add CDP evaluate refactor plan
* Browser: refactor Playwright force-disconnect
* Browser: abort stops evaluate promptly
* Node host: extract withTimeout helper
* Browser: remove disconnected listener safely
* Changelog: note act:evaluate hang fix
---------
Co-authored-by: Bob <bob@dutifulbob.com>
Discussion: https://github.com/openclaw/openclaw/discussions/13528
## Checklist
- [x] **Mark as AI-assisted in the PR title or description** - Implemented by 🤖, reviewed by 👨💻
- [x] **Note the degree of testing** - fully tested and I use it myself
- [x] **Include prompts or session logs if possible (super helpful!)** - I can try doing a "resume" on a few sessions, but don't think it'll provide value. Lmk if this is a blocker.
- [x] **Confirm you understand what the code does** - It's simple :)
## Summary of changes
- **ClawDock** - Shell helpers replace verbose `docker-compose` commands with simple `clawdock-*` shortcuts
- **Zero-config setup** - First run auto-detects the OpenClaw project directory from common paths and saves the config for future use
- **No extra dependencies** - Just bash
- **Built-in auth & device pairing helpers** - `clawdock-fix-token`, `clawdock-dashboard`, etc to handle gateay setup, streamline web UI, etc...
- **Updated Docker docs** - Installation docs now include the optional ClawDock helper setup for users who want simplified container management
## Example Usage
```bash
$ clawdock-help
🦞 ClawDock - Docker Helpers for OpenClaw
⚡ Basic Operations
clawdock-start Start the gateway
clawdock-stop Stop the gateway
clawdock-restart Restart the gateway
clawdock-status Check container status
clawdock-logs View live logs (follows)
🐚 Container Access
clawdock-shell Shell into container (openclaw alias ready)
clawdock-cli Run CLI commands (e.g., clawdock-cli status)
clawdock-exec <cmd> Execute command in gateway container
🌐 Web UI & Devices
clawdock-dashboard Open web UI in browser (auto-guides you)
clawdock-devices List device pairings (auto-guides you)
clawdock-approve <id> Approve device pairing (with examples)
⚙️ Setup & Configuration
clawdock-fix-token Configure gateway token (run once)
🔧 Maintenance
clawdock-rebuild Rebuild Docker image
clawdock-clean ⚠️ Remove containers & volumes (nuclear)
🛠️ Utilities
clawdock-health Run health check
clawdock-token Show gateway auth token
clawdock-cd Jump to openclaw project directory
clawdock-config Open config directory (~/.openclaw)
clawdock-workspace Open workspace directory
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚀 First Time Setup
1. clawdock-start # Start the gateway
2. clawdock-fix-token # Configure token
3. clawdock-dashboard # Open web UI
4. clawdock-devices # If pairing needed
5. clawdock-approve <id> # Approve pairing
💬 WhatsApp Setup
clawdock-shell
> openclaw channels login --channel whatsapp
> openclaw status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 All commands guide you through next steps!
📚 Docs: https://docs.openclaw.ai
```\n\nCo-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* fix(tools): correct Grok response parsing for xAI Responses API
The xAI Responses API returns content in output[0].content[0].text,
not in output_text field. Updated GrokSearchResponse type and
runGrokSearch to extract content from the correct path.
Fixes the 'No response' issue when using Grok web search.
* fix(tools): harden Grok web_search parsing (#13049) (thanks @ereid7)
---------
Co-authored-by: erai <erai@erais-Mac-mini.local>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix: cap Discord gateway reconnect attempts to prevent infinite loop
The Discord GatewayPlugin was configured with maxAttempts: Infinity,
which causes an unbounded reconnection loop when the Discord gateway
enters a persistent failure state (e.g. code 1005 with stalled HELLO).
In production, this manifested as 2,483+ reconnection attempts in a
single log file, starving the Node.js event loop and preventing cron,
heartbeat, and other subsystems from functioning.
Cap maxAttempts at 50, which provides ~25 minutes of retry time
(with 30s HELLO timeout between attempts) before cleanly exiting
via the existing "Max reconnect attempts" error handler.
Closes#11836
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Changelog: note Discord gateway reconnect cap (#12230) (thanks @Yida-Dev)
---------
Co-authored-by: Yida-Dev <reyifeijun@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shadow <shadow@clawd.bot>