Commit Graph

134 Commits

Author SHA1 Message Date
Tak Hoffman
1be39d4250 fix(gateway): synthesize lifecycle robustness for restart and startup probes (#33831)
* fix(gateway): correct launchctl command sequence for gateway restart (closes #20030)

* fix(restart): expand HOME and escape label in launchctl plist path

* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop

When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.

Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.

Fixes #33103
Related: #28134

* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup

* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test

* fix(gateway): hot-reload agents.defaults.models allowlist changes

The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array).  Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.

Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.

Fixes #33600

Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>

* test(restart): 100% branch coverage — audit round 2

Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
  unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
  can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
  but cleanStaleGatewayProcessesSync returned before calling sleepSync —
  replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
  EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
  edge case' were outside their intended describe blocks; moved to correct
  scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
  - status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
  - mid-loop non-openclaw cmd in &&-chain (L67)
  - consecutive p-lines without c-line between them (L67)
  - invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
  - unknown lsof output line (else-if false branch L69)

Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)

* test(restart): fix stale-pid test typing for tsgo

* fix(gateway): address lifecycle review findings

* test(update): make restart-helper path assertions windows-safe

---------

Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
2026-03-03 21:31:12 -06:00
Liu Xiaopai
ae29842158 Gateway: fix stale self version in status output (#32655)
Merged via squash.

Prepared head SHA: b9675d1f90
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 02:41:52 -05:00
Peter Steinberger
8768487aee refactor(shared): dedupe protocol schema typing and session/media helpers 2026-03-02 19:57:33 +00:00
Peter Steinberger
7a7eee920a refactor(gateway): harden plugin http route contracts 2026-03-02 16:48:00 +00:00
Peter Steinberger
33e76db12a refactor(gateway): scope ws origin fallback metrics to runtime 2026-03-02 16:47:00 +00:00
Peter Steinberger
d5ae4b8337 fix(gateway): require local client for loopback origin fallback 2026-03-02 16:37:45 +00:00
Peter Steinberger
2fd8264ab0 refactor(gateway): hard-break plugin wildcard http handlers 2026-03-02 16:24:06 +00:00
Peter Steinberger
93b0724025 fix(gateway): fail closed plugin auth path canonicalization 2026-03-02 15:55:32 +00:00
Peter Steinberger
d3e0c0b29c test(gateway): dedupe gateway and infra test scaffolds 2026-03-02 07:13:10 +00:00
Sid
e1e715c53d fix(gateway): skip device pairing for local backend self-connections (#30801)
* fix(gateway): skip device pairing for local backend self-connections

When gateway.tls is enabled, sessions_spawn (and other internal
callGateway operations) creates a new WebSocket to the gateway.
The gateway treated this self-connection like any external client
and enforced device pairing, rejecting it with "pairing required"
(close code 1008). This made sub-agent spawning impossible when
TLS was enabled in Docker with bind: "lan".

Skip pairing for connections that are gateway-client self-connections
from localhost with valid shared auth (token/password). These are
internal backend calls (e.g. sessions_spawn, subagent-announce) that
already have valid credentials and connect from the same host.

Closes #30740

* gateway: tighten backend self-pair bypass guard

* tests: cover backend self-pairing local-vs-remote auth path

* changelog: add gateway tls pairing fix credit

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-01 21:46:33 -08:00
Peter Steinberger
2d31126e6a refactor(shared): extract reused path and normalization helpers 2026-03-02 05:20:19 +00:00
Peter Steinberger
cef5fae0a2 refactor(gateway): dedupe origin seeding and plugin route auth matching 2026-03-02 00:42:22 +00:00
Peter Steinberger
53d10f8688 fix(gateway): land access/auth/config migration cluster
Land #28960 by @Glucksberg (Tailscale origin auto-allowlist).
Land #29394 by @synchronic1 (allowedOrigins upgrade migration).
Land #29198 by @Mariana-Codebase (plugin HTTP auth guard + route precedence).
Land #30910 by @liuxiaopai-ai (tailscale bind/config.patch guard).

Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: synchronic1 <synchronic1@users.noreply.github.com>
Co-authored-by: Mariana Sinisterra <mariana.data@outlook.com>
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
2026-03-02 00:10:51 +00:00
Ayaan Zaidi
54eaf17327 feat(gateway): add node canvas capability refresh flow 2026-02-27 12:16:36 +05:30
Vincent Koc
cb9374a2a1 Gateway: improve device-auth v2 migration diagnostics (#28305)
* Gateway: add device-auth detail code resolver

* Gateway: emit specific device-auth detail codes

* Gateway tests: cover nonce and signature detail codes

* Docs: add gateway device-auth migration diagnostics

* Docs: add device-auth v2 troubleshooting signatures
2026-02-26 21:05:43 -08:00
Peter Steinberger
081b1aa1ed refactor(gateway): unify v3 auth payload builders and vectors 2026-02-26 15:08:50 +01:00
Kevin Shenghui
9c142993b8 fix: preserve operator scopes for shared auth connections
When connecting via shared gateway token (no device identity),
the operator scopes were being cleared, causing API operations
to fail with 'missing scope' errors.

This fix preserves scopes when sharedAuthOk is true, allowing
headless/API operator clients to retain their requested scopes.

Fixes #27494

(cherry picked from commit c71c8948bd)
2026-02-26 13:40:58 +00:00
Peter Steinberger
7d8aeaaf06 fix(gateway): pin paired reconnect metadata for node policy 2026-02-26 14:11:04 +01:00
Peter Steinberger
4b71de384c fix(core): unify session-key normalization and plugin boundary checks 2026-02-26 12:41:23 +00:00
Peter Steinberger
0cc3e8137c refactor(gateway): centralize trusted-proxy control-ui bypass policy 2026-02-26 02:26:52 +01:00
Peter Steinberger
ec45c317f5 fix(gateway): block trusted-proxy control-ui node bypass 2026-02-26 01:54:19 +01:00
Peter Steinberger
20c2db2103 refactor(gateway): split browser auth hardening paths 2026-02-26 01:37:00 +01:00
Peter Steinberger
c736f11a16 fix(gateway): harden browser websocket auth chain 2026-02-26 01:22:49 +01:00
Peter Steinberger
8d1481cb4a fix(gateway): require pairing for unpaired operator device auth 2026-02-26 00:52:50 +01:00
SidQin-cyber
20523b918a fix(gateway): allow trusted-proxy control-ui auth to skip device pairing
Control UI connections authenticated via gateway.auth.mode=trusted-proxy were
still forced through device pairing because pairing bypass only considered
shared token/password auth (sharedAuthOk). In trusted-proxy deployments,
this produced persistent "pairing required" failures despite valid trusted
proxy headers.

Treat authenticated trusted-proxy control-ui connections as pairing-bypass
eligible and allow missing device identity in that mode.

Fixes #25293

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-24 14:17:54 +00:00
Marco Di Dionisio
83689fc838 fix: include trusted-proxy in sharedAuthOk check
In trusted-proxy mode, sharedAuthResult is null because hasSharedAuth
only triggers for token/password in connectParams.auth. But the primary
auth (authResult) already validated the trusted-proxy — the connection
came from a CIDR in trustedProxies with a valid userHeader. This IS
shared auth semantically (the proxy vouches for identity), so operator
connections should be able to skip device identity.

Without this fix, trusted-proxy operator connections are rejected with
"device identity required" because roleCanSkipDeviceIdentity() sees
sharedAuthOk=false.

(cherry picked from commit e87048a6a6)
2026-02-24 04:33:51 +00:00
Peter Steinberger
223d7dc23d feat(gateway)!: require explicit non-loopback control-ui origins 2026-02-24 01:57:11 +00:00
Vincent Koc
7fb69b7cd2 Gateway: stop repeated unauthorized WS request floods per connection (#24294)
* Gateway WS: add unauthorized flood guard primitive

* Gateway WS: close repeated unauthorized post-handshake request floods

* Gateway WS: test unauthorized flood guard behavior

* Changelog: note gateway WS unauthorized flood guard hardening

* Update CHANGELOG.md
2026-02-23 09:58:47 -05:00
Peter Steinberger
2081b3a3c4 refactor(channels): dedupe hook and monitor execution paths 2026-02-22 21:19:09 +00:00
Tak Hoffman
f8171ffcdc Config UI: tag filters and complete schema help/labels coverage (#23796)
* Config UI: add tag filters and complete schema help/labels

* Config UI: finalize tags/help polish and unblock test suite

* Protocol: regenerate Swift gateway models
2026-02-22 15:17:07 -06:00
Peter Steinberger
9165bd7f37 fix(gateway): auto-approve loopback scope upgrades
Co-authored-by: Marcus Widing <245375637+widingmarcus-cyber@users.noreply.github.com>
2026-02-22 22:11:50 +01:00
Peter Steinberger
7eae1933fb refactor(test): extract shared fixture helpers in gateway and outbound tests 2026-02-22 20:18:20 +00:00
Peter Steinberger
bbdfba5694 fix: harden connect auth flow and exec policy diagnostics 2026-02-22 20:22:00 +01:00
Peter Steinberger
0c1f491a02 fix(gateway): clarify pairing and node auth guidance 2026-02-22 19:50:29 +01:00
Peter Steinberger
b13bba9c35 fix(gateway): skip operator pairing on valid shared auth 2026-02-22 19:25:50 +01:00
Jonathan Works
8c089bbe32 fix(hooks): suppress main session events for silent/delivered hook turns (#20678)
* fix(hooks): suppress main session events for silent/delivered hook turns

When a hook agent turn returns NO_REPLY (SILENT_REPLY_TOKEN), mark the
result as delivered so the hooks handler skips enqueueSystemEvent and
requestHeartbeatNow. Without this, every Gmail notification classified
as NO_REPLY still injects a system event into the main agent session,
causing context window growth proportional to email volume.

Two-part fix:
- cron/isolated-agent/run.ts: set delivered:true when synthesizedText
  matches SILENT_REPLY_TOKEN so callers know no notification is needed
- gateway/server/hooks.ts: guard enqueueSystemEvent + requestHeartbeatNow
  with !result.delivered (addresses duplicate delivery, refs #20196)

Refs: https://github.com/openclaw/openclaw/issues/20196

* Changelog: document hook silent-delivery suppression fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-02-22 12:47:42 -05:00
Peter Steinberger
66529c7aa5 refactor(gateway): unify auth credential resolution 2026-02-22 18:23:13 +01:00
Peter Steinberger
f14ebd743c refactor(security): unify local-host and tailnet CIDR checks 2026-02-22 17:20:27 +01:00
Peter Steinberger
d116bcfb14 refactor(runtime): consolidate followup, gateway, and provider dedupe paths 2026-02-22 14:08:51 +00:00
Peter Steinberger
2c6dd84718 fix(gateway): remove hello-ok host and commit fields 2026-02-22 10:17:36 +01:00
Peter Steinberger
8887f41d7d refactor(gateway)!: remove legacy v1 device-auth handshake 2026-02-22 09:27:03 +01:00
Marcus Widing
fa4e4efd92 fix(gateway): restore localhost Control UI pairing when allowInsecureAuth is set (#22996)
* fix(gateway): allow localhost Control UI without device identity when allowInsecureAuth is set

* fix(gateway): pass isLocalClient to evaluateMissingDeviceIdentity

* test: add regression tests for localhost Control UI pairing

* fix(gateway): require pairing for legacy metadata upgrades

* test(gateway): fix legacy metadata e2e ws typing

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-22 00:04:52 +01:00
Peter Steinberger
51149fcaf1 refactor(gateway): extract connect and role policy logic 2026-02-21 19:47:22 +01:00
Peter Steinberger
ddcb2d79b1 fix(gateway): block node role when device identity is missing 2026-02-21 19:34:13 +01:00
Peter Steinberger
be7f825006 refactor(gateway): harden proxy client ip resolution 2026-02-21 13:36:23 +01:00
Peter Steinberger
36a0df423d refactor(gateway): make ws and http auth surfaces explicit 2026-02-21 13:33:09 +01:00
Peter Steinberger
14b0d2b816 refactor: harden control-ui auth flow and add insecure-flag audit summary 2026-02-21 13:18:23 +01:00
Peter Steinberger
356d61aacf fix(gateway): scope tailscale tokenless auth to websocket 2026-02-21 13:03:13 +01:00
Peter Steinberger
99048dbec2 fix(gateway): align insecure-auth toggle messaging 2026-02-21 12:57:22 +01:00
Coy Geek
40a292619e fix: Control UI Insecure Auth Bypass Allows Token-Only Auth Over HTTP (#20684)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: ad9be4b4d6
Co-authored-by: coygeek <65363919+coygeek@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-02-20 17:34:34 +00:00