Add actionable operator guidance when an unauthorized SIGUSR1 gateway restart is ignored because unmanaged restart is disabled.
The change is log-only: restart authorization and scheduling semantics are unchanged, and the existing run-loop test now asserts both the reason warning and the recovery hint.
Refs #79577
Refs #78110
Refs #82433
Co-authored-by: wAngByg <281221101+wAngByg@users.noreply.github.com>
After config.patch writes new values to openclaw.json, a subsequent
SIGUSR1 in-process restart could overwrite them with a stale snapshot.
Root cause: run-loop's onIteration hook resets lanes and task registry,
but leaves the runtimeConfigSnapshot intact. loadConfig() then returns
the old snapshot via loadPinnedRuntimeConfig() instead of re-reading disk.
Fix: clearRuntimeConfigSnapshot() in the restart iteration hook so the
next startup reads fresh config from disk.
Refs #86350
Honor configured restart drain budgets for embedded runs and avoid a second active-work drain after forced deferral timeout restarts.
Includes maintainer changelog entry.
The SIGTERM handler's fire-and-forget IIFE can reject if the graceful
drain or tunnel-teardown throws. Without a catch, this becomes an
unhandled promise rejection. Add .catch() that logs the error and
falls back to a hard stop request. Same treatment for SIGUSR1.
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
Summary:
- Clear stale SIGUSR1 restart state before rejected or externally allowed restart handling can leave an in-flight token stuck.
- Verify the live gateway version after macOS package-update service refreshes and skip redundant restarts when the refreshed LaunchAgent already serves the expected version.
- Set generated LaunchAgents to a 10s throttle plus 20s shutdown window and widen gateway bind retries around supervisor-owned restarts.
Fixes#79577. Refs #78699 and #60885.
Verification:
- pnpm test src/cli/gateway-cli/run-loop.test.ts src/infra/infra-runtime.test.ts
- pnpm test src/cli/update-cli.test.ts src/daemon/launchd.test.ts src/gateway/server/http-listen.test.ts
- pnpm exec oxfmt --check --threads=1 src/cli/gateway-cli/run-loop.ts src/cli/gateway-cli/run-loop.test.ts
- pnpm check:changed
- Crabbox/Blacksmith wrapper smoke passed focused tests plus pnpm check:changed: https://github.com/openclaw/openclaw/actions/runs/25595985603
- PR CI was green before upstream main advanced; the latest rebased heads hit unrelated broad lint failures also reproduced on current main CI (for example https://github.com/openclaw/openclaw/actions/runs/25598671666). No failing lint diagnostics referenced this gateway/update diff.
Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths.
Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.
Require Control UI updates to observe a real gateway process replacement, surface skipped/error update outcomes, and verify the running gateway version after restart.\n\nAdds update.status restart-sentinel plumbing, docs, generated protocol model updates, and changelog attribution.\n\nLocal verification:\n- pnpm test src/gateway/server-methods/update.test.ts src/cli/gateway-cli/run-loop.test.ts src/infra/restart-sentinel.test.ts src/infra/process-respawn.test.ts src/infra/update-runner.test.ts ui/src/ui/app-gateway.node.test.ts ui/src/ui/controllers/config.test.ts\n- git diff --check\n- pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/gateway/protocol.md docs/gateway/configuration.md docs/web/control-ui.md\n- pnpm docs:check-mdx
- reject new lane enqueues once gateway drain begins
- always reset lane draining state and isolate onWait callback failures
- persist per-session abort cutoff and skip stale queued messages
- avoid false 600s agentTurn timeout in isolated cron jobs
Fixes#27407Fixes#27332Fixes#27427
Co-authored-by: Kevin Shenghui <shenghuikevin@github.com>
Co-authored-by: zjmy <zhangjunmengyang@gmail.com>
Co-authored-by: suko <miha.sukic@gmail.com>
process.exit() called from inside an async IIFE bypasses the outer
try/finally block that releases the gateway lock. This leaves a stale
lock file pointing to a zombie PID, preventing the spawned child or
systemctl restart from acquiring the lock. Release the lock explicitly
before calling exit in both the restart-spawned and stop code paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>