When preferSetupRuntimeForChannelPlugins is active, gateway boot performs
two plugin loads: a setup-runtime pass and a full reload after listen.
The initial pin captured the setup-entry snapshot. The deferred reload now
re-pins so getChannelPlugin() resolves against the full implementations.
Channel plugin resolution fails with 'Channel is unavailable: <channel>'
after the active plugin registry is replaced at runtime. The root cause is
that getChannelPlugin() resolves against the live registry snapshot, which
is replaced when non-primary registry loads (e.g., config-schema reads)
call loadOpenClawPlugins(). If the replacement registry does not carry the
same channel entries, outbound message delivery and subagent announce
silently break.
This mirrors the existing pinActivePluginHttpRouteRegistry pattern: the
channel registry is pinned at gateway startup and released on shutdown.
Subsequent setActivePluginRegistry calls no longer evict the channel
snapshot, so getChannelPlugin() always resolves against the registry that
was active when the gateway booted.
emitChatFinal frees buffers on clean run completion, and the
maintenance timer sweeps abortedRuns after ABORTED_RUN_TTL_MS. But
runs that get stuck (e.g. LLM timeout without triggering clean
lifecycle end) are never aborted and their string buffers persist
indefinitely. This is the direct trigger for the StringAdd_CheckNone
OOM crash reported in the issue.
Add a stale buffer sweep in the maintenance timer that cleans up
buffers, deltaSentAt, and deltaLastBroadcastLen for any run not
updated within ABORTED_RUN_TTL_MS, regardless of abort status.
Closes#51821
* feat(gateway): add auth rate-limiting & brute-force protection
Add a per-IP sliding-window rate limiter to Gateway authentication
endpoints (HTTP, WebSocket upgrade, and WS message-level auth).
When gateway.auth.rateLimit is configured, failed auth attempts are
tracked per client IP. Once the threshold is exceeded within the
sliding window, further attempts are blocked with HTTP 429 + Retry-After
until the lockout period expires. Loopback addresses are exempt by
default so local CLI sessions are never locked out.
The limiter is only created when explicitly configured (undefined
otherwise), keeping the feature fully opt-in and backward-compatible.
* fix(gateway): isolate auth rate-limit scopes and normalize 429 responses
---------
Co-authored-by: buerbaumer <buerbaumer@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Add a new `/v1/responses` endpoint implementing the OpenResponses API
standard for agentic workflows. This provides:
- Item-based input (messages, function_call_output, reasoning)
- Semantic streaming events (response.created, response.output_text.delta,
response.completed, etc.)
- Full SSE event support with both event: and data: lines
- Configuration via gateway.http.endpoints.responses.enabled
The endpoint is disabled by default and can be enabled independently
from the existing Chat Completions endpoint.
Phase 1 implementation supports:
- String or ItemParam[] input
- system/developer/user/assistant message roles
- function_call_output items
- instructions parameter
- Agent routing via headers or model parameter
- Session key management
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>