* fix(gateway): correct launchctl command sequence for gateway restart (closes#20030)
* fix(restart): expand HOME and escape label in launchctl plist path
* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop
When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.
Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.
Fixes#33103
Related: #28134
* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup
* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test
* fix(gateway): hot-reload agents.defaults.models allowlist changes
The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array). Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.
Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.
Fixes#33600
Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>
* test(restart): 100% branch coverage — audit round 2
Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
but cleanStaleGatewayProcessesSync returned before calling sleepSync —
replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
edge case' were outside their intended describe blocks; moved to correct
scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
- status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
- mid-loop non-openclaw cmd in &&-chain (L67)
- consecutive p-lines without c-line between them (L67)
- invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
- unknown lsof output line (else-if false branch L69)
Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)
* test(restart): fix stale-pid test typing for tsgo
* fix(gateway): address lifecycle review findings
* test(update): make restart-helper path assertions windows-safe
---------
Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
* fix(gateway): skip device pairing for local backend self-connections
When gateway.tls is enabled, sessions_spawn (and other internal
callGateway operations) creates a new WebSocket to the gateway.
The gateway treated this self-connection like any external client
and enforced device pairing, rejecting it with "pairing required"
(close code 1008). This made sub-agent spawning impossible when
TLS was enabled in Docker with bind: "lan".
Skip pairing for connections that are gateway-client self-connections
from localhost with valid shared auth (token/password). These are
internal backend calls (e.g. sessions_spawn, subagent-announce) that
already have valid credentials and connect from the same host.
Closes#30740
* gateway: tighten backend self-pair bypass guard
* tests: cover backend self-pairing local-vs-remote auth path
* changelog: add gateway tls pairing fix credit
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
When connecting via shared gateway token (no device identity),
the operator scopes were being cleared, causing API operations
to fail with 'missing scope' errors.
This fix preserves scopes when sharedAuthOk is true, allowing
headless/API operator clients to retain their requested scopes.
Fixes#27494
(cherry picked from commit c71c8948bd)
Control UI connections authenticated via gateway.auth.mode=trusted-proxy were
still forced through device pairing because pairing bypass only considered
shared token/password auth (sharedAuthOk). In trusted-proxy deployments,
this produced persistent "pairing required" failures despite valid trusted
proxy headers.
Treat authenticated trusted-proxy control-ui connections as pairing-bypass
eligible and allow missing device identity in that mode.
Fixes#25293
Co-authored-by: Cursor <cursoragent@cursor.com>
In trusted-proxy mode, sharedAuthResult is null because hasSharedAuth
only triggers for token/password in connectParams.auth. But the primary
auth (authResult) already validated the trusted-proxy — the connection
came from a CIDR in trustedProxies with a valid userHeader. This IS
shared auth semantically (the proxy vouches for identity), so operator
connections should be able to skip device identity.
Without this fix, trusted-proxy operator connections are rejected with
"device identity required" because roleCanSkipDeviceIdentity() sees
sharedAuthOk=false.
(cherry picked from commit e87048a6a6)
* Config UI: add tag filters and complete schema help/labels
* Config UI: finalize tags/help polish and unblock test suite
* Protocol: regenerate Swift gateway models
* fix(hooks): suppress main session events for silent/delivered hook turns
When a hook agent turn returns NO_REPLY (SILENT_REPLY_TOKEN), mark the
result as delivered so the hooks handler skips enqueueSystemEvent and
requestHeartbeatNow. Without this, every Gmail notification classified
as NO_REPLY still injects a system event into the main agent session,
causing context window growth proportional to email volume.
Two-part fix:
- cron/isolated-agent/run.ts: set delivered:true when synthesizedText
matches SILENT_REPLY_TOKEN so callers know no notification is needed
- gateway/server/hooks.ts: guard enqueueSystemEvent + requestHeartbeatNow
with !result.delivered (addresses duplicate delivery, refs #20196)
Refs: https://github.com/openclaw/openclaw/issues/20196
* Changelog: document hook silent-delivery suppression fix
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(gateway): allow localhost Control UI without device identity when allowInsecureAuth is set
* fix(gateway): pass isLocalClient to evaluateMissingDeviceIdentity
* test: add regression tests for localhost Control UI pairing
* fix(gateway): require pairing for legacy metadata upgrades
* test(gateway): fix legacy metadata e2e ws typing
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>