From 88da533714e5f9d6988d4a0af32bb62cd059fb52 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Fri, 1 May 2026 09:54:57 +0100 Subject: [PATCH] fix: bypass update restart cooldown --- CHANGELOG.md | 2 +- docs/cli/update.md | 8 +++---- docs/gateway/protocol.md | 2 +- docs/install/updating.md | 10 ++++---- src/gateway/server-methods/update.test.ts | 1 + src/gateway/server-methods/update.ts | 1 + src/infra/infra-runtime.test.ts | 28 +++++++++++++++++++++++ src/infra/restart.ts | 6 ++++- 8 files changed, 46 insertions(+), 12 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a9bc5cfd35c..2037b65a58f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -55,7 +55,7 @@ Docs: https://docs.openclaw.ai - Plugins/runtime-deps: recover interrupted bundled runtime-dependency installs whose package sentinels exist but generated materialization is incomplete, forcing npm/pnpm repair in Gateway startup, doctor, and lazy plugin loads instead of leaving channels crash-looping on missing packages. Fixes #75309; refs #75310, #75296, and #75304. Thanks @scottgl9. - Plugins/runtime-deps: treat no-main and export-map package sentinels without reachable entry files as incomplete, so Gateway startup, doctor, and lazy plugin loads repair interrupted bundled dependency installs instead of accepting package.json-only partial installs. Fixes #75309; refs #75183. Thanks @shakkernerd. - Plugins/runtime-deps: keep runtime inspection and channel maintenance commands from downloading bundled plugin dependencies, route explicit repairs through `openclaw plugins deps --repair`, and still allow Gateway/DO paths to repair missing deps before import. Refs #75069. Thanks @xiaohuaxi. -- Updates: force non-deferred update restarts after package-manager updates requested through the live Gateway control plane and fail release validation on post-swap stale chunk import crashes, so Telegram/Discord imports do not stay pointed at removed dist files. Fixes #75206. Thanks @xonaman and @faux123. +- Updates: force non-deferred, no-cooldown update restarts after package-manager updates requested through the live Gateway control plane and fail release validation on post-swap stale chunk import crashes, so Telegram/Discord imports do not stay pointed at removed dist files. Fixes #75206. Thanks @xonaman and @faux123. - Agents/tool-result guard: use the resolved runtime context token budget for non-context-engine tool-result overflow checks, so long tool-heavy sessions no longer compact early when `contextTokens` is larger than native `contextWindow`. Fixes #74917. Thanks @kAIborg24. - Gateway/systemd: exit with sysexits 78 for supervised lock and `EADDRINUSE` conflicts so `RestartPreventExitStatus=78` stops `Restart=always` restart loops instead of repeatedly reloading plugins against an occupied port. Fixes #75115. Thanks @yhyatt. - Agents/runtime: skip blank visible user prompts at the embedded-runner boundary before provider submission while still allowing internal runtime-only turns and media-only prompts, so Telegram/group sessions no longer leak raw empty-input provider errors when replay history exists. Fixes #74137. Thanks @yelog, @Gracker, and @nhaener. diff --git a/docs/cli/update.md b/docs/cli/update.md index e3ed3764cf8..b7ae7f5a4bc 100644 --- a/docs/cli/update.md +++ b/docs/cli/update.md @@ -84,9 +84,9 @@ install method aligned: The Gateway core auto-updater (when enabled via config) launches the CLI update path outside the live Gateway request handler. Control-plane `update.run` package-manager -updates force a non-deferred update restart after the package swap, because the old -Gateway process may still have in-memory chunks that point at files removed by the -new package. +updates force a non-deferred, no-cooldown update restart after the package swap, +because the old Gateway process may still have in-memory chunks that point at +files removed by the new package. For package-manager installs, `openclaw update` resolves the target package version before invoking the package manager. npm global installs use a staged @@ -155,7 +155,7 @@ If an exact pinned npm plugin update resolves to an artifact whose integrity dif Post-update plugin sync failures fail the update result and stop restart follow-up work. Fix the plugin install or update error, then rerun `openclaw update`. -When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Package-manager `update.run` restarts bypass the normal idle deferral after the package tree has been swapped, so the old process cannot keep lazy-loading removed chunks. Service-manager restarts still drain runtime-dependency staging before closing the Gateway. +When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Package-manager `update.run` restarts bypass the normal idle deferral and restart cooldown after the package tree has been swapped, so the old process cannot keep lazy-loading removed chunks. Service-manager restarts still drain runtime-dependency staging before closing the Gateway. If pnpm bootstrap still fails, the updater stops early with a package-manager-specific error instead of trying `npm run build` inside the checkout. diff --git a/docs/gateway/protocol.md b/docs/gateway/protocol.md index 35459dace87..40751c78fbd 100644 --- a/docs/gateway/protocol.md +++ b/docs/gateway/protocol.md @@ -378,7 +378,7 @@ enumeration of `src/gateway/server-methods/*.ts`. - `config.apply` validates + replaces the full config payload. - `config.schema` returns the live config schema payload used by Control UI and CLI tooling: schema, `uiHints`, version, and generation metadata, including plugin + channel schema metadata when the runtime can load it. The schema includes field `title` / `description` metadata derived from the same labels and help text used by the UI, including nested object, wildcard, array-item, and `anyOf` / `oneOf` / `allOf` composition branches when matching field documentation exists. - `config.schema.lookup` returns a path-scoped lookup payload for one config path: normalized path, a shallow schema node, matched hint + `hintPath`, and immediate child summaries for UI/CLI drill-down. Lookup schema nodes keep the user-facing docs and common validation fields (`title`, `description`, `type`, `enum`, `const`, `format`, `pattern`, numeric/string/array/object bounds, and flags like `additionalProperties`, `deprecated`, `readOnly`, `writeOnly`). Child summaries expose `key`, normalized `path`, `type`, `required`, `hasChildren`, plus the matched `hint` / `hintPath`. - - `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded. Package-manager updates force a non-deferred update restart after the package swap so the old Gateway process does not keep lazy-loading from a replaced `dist` tree. + - `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded. Package-manager updates force a non-deferred, no-cooldown update restart after the package swap so the old Gateway process does not keep lazy-loading from a replaced `dist` tree. - `update.status` returns the latest cached update restart sentinel, including the post-restart running version when available. - `wizard.start`, `wizard.next`, `wizard.status`, and `wizard.cancel` expose the onboarding wizard over WS RPC. diff --git a/docs/install/updating.md b/docs/install/updating.md index bbc8d187848..850db51f019 100644 --- a/docs/install/updating.md +++ b/docs/install/updating.md @@ -169,11 +169,11 @@ The gateway also logs an update hint on startup (disable with `update.checkOnSta For downgrade or incident recovery, set `OPENCLAW_NO_AUTO_UPDATE=1` in the gateway environment to block automatic applies even when `update.auto.enabled` is configured. Startup update hints can still run unless `update.checkOnStart` is also disabled. Package-manager updates requested through the live Gateway control-plane handler -force a non-deferred update restart after the package swap. That avoids leaving -an old in-memory process around long enough to lazy-load chunks from a package -tree that has already been replaced. Shell `openclaw update` remains the -preferred path for supervised installs because it can stop and restart the -service around the update. +force a non-deferred, no-cooldown update restart after the package swap. That +avoids leaving an old in-memory process around long enough to lazy-load chunks +from a package tree that has already been replaced. Shell `openclaw update` +remains the preferred path for supervised installs because it can stop and +restart the service around the update. ## After updating diff --git a/src/gateway/server-methods/update.test.ts b/src/gateway/server-methods/update.test.ts index 3dd237ffae8..194d0d405ed 100644 --- a/src/gateway/server-methods/update.test.ts +++ b/src/gateway/server-methods/update.test.ts @@ -297,6 +297,7 @@ describe("update.run restart scheduling", () => { expect.objectContaining({ delayMs: 0, reason: "update.run", + skipCooldown: true, skipDeferral: true, }), ); diff --git a/src/gateway/server-methods/update.ts b/src/gateway/server-methods/update.ts index dd1cc717320..121772122c1 100644 --- a/src/gateway/server-methods/update.ts +++ b/src/gateway/server-methods/update.ts @@ -147,6 +147,7 @@ export const updateHandlers: GatewayRequestHandlers = { delayMs: updateWasPackageSwap ? 0 : restartDelayMs, reason: "update.run", skipDeferral: updateWasPackageSwap, + skipCooldown: updateWasPackageSwap, audit: { actor: actor.actor, deviceId: actor.deviceId, diff --git a/src/infra/infra-runtime.test.ts b/src/infra/infra-runtime.test.ts index c5c67f4d4a1..2a758236a48 100644 --- a/src/infra/infra-runtime.test.ts +++ b/src/infra/infra-runtime.test.ts @@ -425,6 +425,34 @@ describe("infra runtime", () => { process.removeListener("SIGUSR1", handler); } }); + + it("bypasses restart cooldown when requested", async () => { + const emitSpy = vi.spyOn(process, "emit"); + const handler = () => {}; + process.on("SIGUSR1", handler); + try { + scheduleGatewaySigusr1Restart({ delayMs: 0, reason: "first" }); + await vi.advanceTimersByTimeAsync(0); + expect(consumeGatewaySigusr1RestartAuthorization()).toBe(true); + markGatewaySigusr1RestartHandled(); + + const forced = scheduleGatewaySigusr1Restart({ + delayMs: 0, + reason: "update.run", + skipCooldown: true, + }); + + expect(forced.coalesced).toBe(false); + expect(forced.delayMs).toBe(0); + expect(forced.cooldownMsApplied).toBe(0); + + await vi.advanceTimersByTimeAsync(0); + expect(emitSpy.mock.calls.filter((args) => args[0] === "SIGUSR1").length).toBe(2); + expect(peekGatewaySigusr1RestartReason()).toBe("update.run"); + } finally { + process.removeListener("SIGUSR1", handler); + } + }); }); describe("pre-restart deferral check", () => { diff --git a/src/infra/restart.ts b/src/infra/restart.ts index 953fcbc3503..7e508efecac 100644 --- a/src/infra/restart.ts +++ b/src/infra/restart.ts @@ -661,6 +661,7 @@ export function scheduleGatewaySigusr1Restart(opts?: { audit?: RestartAuditInfo; emitHooks?: RestartEmitHooks; skipDeferral?: boolean; + skipCooldown?: boolean; }): ScheduledRestart { const delayMsRaw = typeof opts?.delayMs === "number" && Number.isFinite(opts.delayMs) @@ -674,7 +675,10 @@ export function scheduleGatewaySigusr1Restart(opts?: { const hasSigusr1Listener = process.listenerCount("SIGUSR1") > 0; const mode = hasSigusr1Listener ? "emit" : process.platform === "win32" ? "supervisor" : "signal"; const nowMs = Date.now(); - const cooldownMsApplied = Math.max(0, lastRestartEmittedAt + RESTART_COOLDOWN_MS - nowMs); + const skipCooldown = opts?.skipCooldown === true; + const cooldownMsApplied = skipCooldown + ? 0 + : Math.max(0, lastRestartEmittedAt + RESTART_COOLDOWN_MS - nowMs); const requestedDueAt = nowMs + delayMs + cooldownMsApplied; const skipDeferral = opts?.skipDeferral === true;