fix: force package update restart handoff

This commit is contained in:
Peter Steinberger
2026-05-01 09:25:20 +01:00
parent 6efb44944c
commit e131eaecb5
11 changed files with 279 additions and 38 deletions

View File

@@ -53,6 +53,7 @@ Docs: https://docs.openclaw.ai
- Plugins/runtime-deps: recover interrupted bundled runtime-dependency installs whose package sentinels exist but generated materialization is incomplete, forcing npm/pnpm repair in Gateway startup, doctor, and lazy plugin loads instead of leaving channels crash-looping on missing packages. Fixes #75309; refs #75310, #75296, and #75304. Thanks @scottgl9.
- Plugins/runtime-deps: treat no-main and export-map package sentinels without reachable entry files as incomplete, so Gateway startup, doctor, and lazy plugin loads repair interrupted bundled dependency installs instead of accepting package.json-only partial installs. Fixes #75309; refs #75183. Thanks @shakkernerd.
- Plugins/runtime-deps: keep runtime inspection and channel maintenance commands from downloading bundled plugin dependencies, route explicit repairs through `openclaw plugins deps --repair`, and still allow Gateway/DO paths to repair missing deps before import. Refs #75069. Thanks @xiaohuaxi.
- Updates: force non-deferred update restarts after package-manager updates requested through the live Gateway control plane and fail release validation on post-swap stale chunk import crashes, so Telegram/Discord imports do not stay pointed at removed dist files. Fixes #75206. Thanks @xonaman and @faux123.
- Agents/tool-result guard: use the resolved runtime context token budget for non-context-engine tool-result overflow checks, so long tool-heavy sessions no longer compact early when `contextTokens` is larger than native `contextWindow`. Fixes #74917. Thanks @kAIborg24.
- Gateway/systemd: exit with sysexits 78 for supervised lock and `EADDRINUSE` conflicts so `RestartPreventExitStatus=78` stops `Restart=always` restart loops instead of repeatedly reloading plugins against an occupied port. Fixes #75115. Thanks @yhyatt.
- Agents/runtime: skip blank visible user prompts at the embedded-runner boundary before provider submission while still allowing internal runtime-only turns and media-only prompts, so Telegram/group sessions no longer leak raw empty-input provider errors when replay history exists. Fixes #74137. Thanks @yelog, @Gracker, and @nhaener.

View File

@@ -82,7 +82,11 @@ install method aligned:
- `beta` → prefers npm dist-tag `beta`, but falls back to `latest` when beta is
missing or older than the current stable release.
The Gateway core auto-updater (when enabled via config) reuses this same update path.
The Gateway core auto-updater (when enabled via config) launches the CLI update path
outside the live Gateway request handler. Control-plane `update.run` package-manager
updates force a non-deferred update restart after the package swap, because the old
Gateway process may still have in-memory chunks that point at files removed by the
new package.
For package-manager installs, `openclaw update` resolves the target package
version before invoking the package manager. npm global installs use a staged
@@ -151,7 +155,7 @@ If an exact pinned npm plugin update resolves to an artifact whose integrity dif
<Note>
Post-update plugin sync failures fail the update result and stop restart follow-up work. Fix the plugin install or update error, then rerun `openclaw update`.
When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Update-triggered restarts drain any active runtime-dependency staging before closing the Gateway, so service-manager restarts do not interrupt an in-flight npm install.
When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Package-manager `update.run` restarts bypass the normal idle deferral after the package tree has been swapped, so the old process cannot keep lazy-loading removed chunks. Service-manager restarts still drain runtime-dependency staging before closing the Gateway.
If pnpm bootstrap still fails, the updater stops early with a package-manager-specific error instead of trying `npm run build` inside the checkout.
</Note>

View File

@@ -378,7 +378,7 @@ enumeration of `src/gateway/server-methods/*.ts`.
- `config.apply` validates + replaces the full config payload.
- `config.schema` returns the live config schema payload used by Control UI and CLI tooling: schema, `uiHints`, version, and generation metadata, including plugin + channel schema metadata when the runtime can load it. The schema includes field `title` / `description` metadata derived from the same labels and help text used by the UI, including nested object, wildcard, array-item, and `anyOf` / `oneOf` / `allOf` composition branches when matching field documentation exists.
- `config.schema.lookup` returns a path-scoped lookup payload for one config path: normalized path, a shallow schema node, matched hint + `hintPath`, and immediate child summaries for UI/CLI drill-down. Lookup schema nodes keep the user-facing docs and common validation fields (`title`, `description`, `type`, `enum`, `const`, `format`, `pattern`, numeric/string/array/object bounds, and flags like `additionalProperties`, `deprecated`, `readOnly`, `writeOnly`). Child summaries expose `key`, normalized `path`, `type`, `required`, `hasChildren`, plus the matched `hint` / `hintPath`.
- `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded.
- `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded. Package-manager updates force a non-deferred update restart after the package swap so the old Gateway process does not keep lazy-loading from a replaced `dist` tree.
- `update.status` returns the latest cached update restart sentinel, including the post-restart running version when available.
- `wizard.start`, `wizard.next`, `wizard.status`, and `wizard.cancel` expose the onboarding wizard over WS RPC.

View File

@@ -168,6 +168,13 @@ The auto-updater is off by default. Enable it in `~/.openclaw/openclaw.json`:
The gateway also logs an update hint on startup (disable with `update.checkOnStart: false`).
For downgrade or incident recovery, set `OPENCLAW_NO_AUTO_UPDATE=1` in the gateway environment to block automatic applies even when `update.auto.enabled` is configured. Startup update hints can still run unless `update.checkOnStart` is also disabled.
Package-manager updates requested through the live Gateway control-plane handler
force a non-deferred update restart after the package swap. That avoids leaving
an old in-memory process around long enough to lazy-load chunks from a package
tree that has already been replaced. Shell `openclaw update` remains the
preferred path for supervised installs because it can stop and restart the
service around the update.
## After updating
<Steps>

View File

@@ -0,0 +1,119 @@
# Update run package self-upgrade
```yaml qa-scenario
id: update-run-package-self-upgrade
title: Update run package self-upgrade
surface: runtime
coverage:
primary:
- runtime.update-run
secondary:
- runtime.gateway-restart
- runtime.package-update
objective: Verify an agent can self-update an installed OpenClaw package from 2026.4.26 to latest by using the gateway update.run action, then recover through the forced restart.
successCriteria:
- The agent is explicitly instructed to use the gateway tool action update.run instead of shell package-manager commands.
- The update request carries a restart note marker that can be observed after the gateway restart.
- Gateway and qa-channel return healthy after update.run restarts the process.
docsRefs:
- docs/cli/update.md
- docs/install/updating.md
- docs/gateway/protocol.md
codeRefs:
- src/agents/tools/gateway-tool.ts
- src/gateway/server-methods/update.ts
- src/infra/restart.ts
execution:
kind: flow
summary: "Opt-in destructive package-update lane: ask the agent to update a 2026.4.26 install to latest via gateway action update.run and verify the restart marker after recovery."
config:
requiredProviderMode: live-frontier
sourceVersion: "2026.4.26"
targetTag: latest
allowEnv: OPENCLAW_QA_ALLOW_UPDATE_RUN_SELF
channelId: qa-room
```
```yaml qa-flow
steps:
- name: asks the agent to self-update through update.run
actions:
- if:
expr: "env.gateway.runtimeEnv[config.allowEnv] !== '1'"
then:
- assert: "true"
else:
- call: waitForGatewayHealthy
args:
- ref: env
- 60000
- call: waitForQaChannelReady
args:
- ref: env
- 60000
- call: reset
- set: sessionKey
value:
expr: "buildAgentSessionKey({ agentId: 'qa', channel: 'qa-channel', peer: { kind: 'channel', id: config.channelId } })"
- call: createSession
args:
- ref: env
- Update run package self-upgrade
- ref: sessionKey
- call: readEffectiveTools
saveAs: tools
args:
- ref: env
- ref: sessionKey
- assert:
expr: "tools.has('gateway')"
message: gateway tool not present for update.run self-upgrade scenario
- set: startIndex
value:
expr: state.getSnapshot().messages.length
- set: marker
value:
expr: "`QA-UPDATE-RUN-${randomUUID().slice(0, 8)}`"
- call: startAgentRun
saveAs: started
args:
- ref: env
- sessionKey:
ref: sessionKey
to:
expr: "`channel:${config.channelId}`"
message:
expr: |-
`Update-run self-upgrade QA check. The OpenClaw package under test was installed from openclaw@${config.sourceVersion} and must update itself to openclaw@${config.targetTag}. Use the gateway tool with action=update.run. Do not run npm, pnpm, bun, git pull, or shell package-manager commands yourself. Set note exactly to "${marker} update.run complete" and restartDelayMs to 0 so the post-restart channel message proves recovery.`
timeoutMs:
expr: liveTurnTimeoutMs(env, 180000)
- call: waitForGatewayHealthy
args:
- ref: env
- 180000
- call: waitForQaChannelReady
args:
- ref: env
- 180000
- call: waitForOutboundMessage
saveAs: outbound
args:
- ref: state
- lambda:
params: [candidate]
expr: "candidate.text.includes(marker)"
- expr: liveTurnTimeoutMs(env, 180000)
- sinceIndex:
ref: startIndex
- call: env.gateway.call
saveAs: updateStatus
args:
- update.status
- {}
- timeoutMs: 30000
- assert:
expr: "Boolean(updateStatus?.sentinel)"
message:
expr: "`update.status did not report a restart sentinel after update.run: ${JSON.stringify(updateStatus)}`"
detailsExpr: "env.gateway.runtimeEnv[config.allowEnv] !== '1' ? `skipped destructive package self-update; set ${config.allowEnv}=1 to run` : `runId=${started.runId} marker=${marker} outbound=${outbound.text}`"
```

View File

@@ -1256,30 +1256,11 @@ export function buildRealUpdateEnv(env) {
return updateEnv;
}
export function verifyPackagedUpgradeUpdateResult(result, options) {
export function verifyPackagedUpgradeUpdateResult(result, _options) {
if (result.exitCode === 0) {
return;
}
let payload = null;
try {
payload = JSON.parse(result.stdout);
} catch {
payload = null;
}
const steps = Array.isArray(payload?.steps) ? payload.steps : [];
const allStepsSucceeded = steps.every((step) => step?.exitCode === 0);
const afterVersion = typeof payload?.after?.version === "string" ? payload.after.version : "";
if (
payload?.status === "ok" &&
afterVersion === options.candidateVersion &&
allStepsSucceeded &&
isSelfSwappedPackageProcessExit(result.stderr)
) {
return;
}
throw new Error(
`Packaged upgrade failed (${result.exitCode}): ${trimForSummary(
`${result.stdout}\n${result.stderr}`,
@@ -1287,15 +1268,6 @@ export function verifyPackagedUpgradeUpdateResult(result, options) {
);
}
function isSelfSwappedPackageProcessExit(stderr) {
return (
typeof stderr === "string" &&
stderr.includes("[openclaw] Failed to start CLI:") &&
stderr.includes("ERR_MODULE_NOT_FOUND") &&
/[\\/]node_modules[\\/]openclaw[\\/]dist[\\/]/u.test(stderr)
);
}
export function resolveExplicitBaselineVersion(baselineSpec) {
const trimmed = baselineSpec.trim();
if (!trimmed || trimmed === "openclaw@latest") {

View File

@@ -276,7 +276,34 @@ describe("update.run restart scheduling", () => {
);
});
it("blocks unmanaged global installs before package mutation when restart is unavailable", async () => {
it("forces an immediate restart after successful package-manager updates", async () => {
resolveUpdateInstallSurfaceMock.mockResolvedValueOnce({
kind: "global",
mode: "npm",
root: "/tmp/openclaw-global",
packageRoot: "/tmp/openclaw-global",
});
let payload:
| { ok: boolean; result?: { status?: string; reason?: string; mode?: string } }
| undefined;
await invokeUpdateRun({}, (_ok: boolean, response: unknown) => {
payload = response as typeof payload;
});
expect(runGatewayUpdateMock).toHaveBeenCalledTimes(1);
expect(scheduleGatewaySigusr1RestartMock).toHaveBeenCalledWith(
expect.objectContaining({
delayMs: 0,
reason: "update.run",
skipDeferral: true,
}),
);
expect(payload?.ok).toBe(true);
});
it("blocks global package installs when the gateway cannot restart afterward", async () => {
isRestartEnabledMock.mockReturnValue(false);
detectRespawnSupervisorMock.mockReturnValue(null);
resolveUpdateInstallSurfaceMock.mockResolvedValueOnce({

View File

@@ -140,11 +140,13 @@ export const updateHandlers: GatewayRequestHandlers = {
// Only restart the gateway when the update actually succeeded.
// Restarting after a failed update leaves the process in a broken state
// (corrupted node_modules, partial builds) and causes a crash loop.
const updateWasPackageSwap = result.status === "ok" && result.mode !== "git";
const restart =
result.status === "ok"
? scheduleGatewaySigusr1Restart({
delayMs: restartDelayMs,
delayMs: updateWasPackageSwap ? 0 : restartDelayMs,
reason: "update.run",
skipDeferral: updateWasPackageSwap,
audit: {
actor: actor.actor,
deviceId: actor.deviceId,

View File

@@ -483,6 +483,85 @@ describe("infra runtime", () => {
}
});
it("bypasses the pre-restart deferral check when requested", async () => {
const emitSpy = vi.spyOn(process, "emit");
const pendingCheck = vi.fn(() => 5);
const handler = () => {};
process.on("SIGUSR1", handler);
try {
setPreRestartDeferralCheck(pendingCheck);
scheduleGatewaySigusr1Restart({
delayMs: 0,
reason: "update.run",
skipDeferral: true,
});
await vi.advanceTimersByTimeAsync(0);
expect(pendingCheck).not.toHaveBeenCalled();
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
} finally {
process.removeListener("SIGUSR1", handler);
}
});
it("upgrades an already scheduled restart to bypass deferral", async () => {
const emitSpy = vi.spyOn(process, "emit");
const pendingCheck = vi.fn(() => 5);
const handler = () => {};
process.on("SIGUSR1", handler);
try {
setPreRestartDeferralCheck(pendingCheck);
scheduleGatewaySigusr1Restart({ delayMs: 1_000, reason: "config.patch" });
const forced = scheduleGatewaySigusr1Restart({
delayMs: 1_000,
reason: "update.run",
skipDeferral: true,
});
expect(forced.coalesced).toBe(false);
await vi.advanceTimersByTimeAsync(1_000);
expect(pendingCheck).not.toHaveBeenCalled();
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
} finally {
process.removeListener("SIGUSR1", handler);
}
});
it("bypasses an active restart deferral when a forced restart arrives", async () => {
const emitSpy = vi.spyOn(process, "emit");
const staleBeforeEmit = vi.fn(async () => {});
const handler = () => {};
process.on("SIGUSR1", handler);
try {
setPreRestartDeferralCheck(() => 5);
scheduleGatewaySigusr1Restart({
delayMs: 0,
reason: "config.patch",
emitHooks: { beforeEmit: staleBeforeEmit },
});
await vi.advanceTimersByTimeAsync(0);
expect(emitSpy).not.toHaveBeenCalledWith("SIGUSR1");
const forced = scheduleGatewaySigusr1Restart({
delayMs: 0,
reason: "update.run",
skipDeferral: true,
});
expect(forced.coalesced).toBe(false);
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
expect(staleBeforeEmit).not.toHaveBeenCalled();
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
} finally {
process.removeListener("SIGUSR1", handler);
}
});
it("emits SIGUSR1 after the default deferral timeout while work is still pending", async () => {
const emitSpy = vi.spyOn(process, "emit");
const handler = () => {};

View File

@@ -44,6 +44,7 @@ let pendingRestartTimer: ReturnType<typeof setTimeout> | null = null;
let pendingRestartDueAt = 0;
let pendingRestartReason: string | undefined;
let pendingRestartEmitHooks: RestartEmitHooks | undefined;
let pendingRestartSkipDeferral = false;
let pendingRestartPreparing = false;
const activeDeferralPolls = new Set<ReturnType<typeof setInterval>>();
@@ -63,6 +64,7 @@ function clearPendingScheduledRestart(): void {
pendingRestartDueAt = 0;
pendingRestartReason = undefined;
pendingRestartEmitHooks = undefined;
pendingRestartSkipDeferral = false;
pendingRestartPreparing = false;
}
@@ -658,6 +660,7 @@ export function scheduleGatewaySigusr1Restart(opts?: {
reason?: string;
audit?: RestartAuditInfo;
emitHooks?: RestartEmitHooks;
skipDeferral?: boolean;
}): ScheduledRestart {
const delayMsRaw =
typeof opts?.delayMs === "number" && Number.isFinite(opts.delayMs)
@@ -673,6 +676,7 @@ export function scheduleGatewaySigusr1Restart(opts?: {
const nowMs = Date.now();
const cooldownMsApplied = Math.max(0, lastRestartEmittedAt + RESTART_COOLDOWN_MS - nowMs);
const requestedDueAt = nowMs + delayMs + cooldownMsApplied;
const skipDeferral = opts?.skipDeferral === true;
if (hasUnconsumedRestartSignal()) {
if (shouldPreferRestartReason(reason, emittedRestartReason)) {
@@ -695,7 +699,29 @@ export function scheduleGatewaySigusr1Restart(opts?: {
if (pendingRestartTimer || pendingRestartPreparing) {
const remainingMs = pendingRestartPreparing ? 0 : Math.max(0, pendingRestartDueAt - nowMs);
const shouldPullEarlier = !pendingRestartPreparing && requestedDueAt < pendingRestartDueAt;
if (pendingRestartPreparing && skipDeferral && activeDeferralPolls.size > 0) {
restartLog.warn(
`restart request bypassed active deferral reason=${reason ?? "unspecified"} pendingReason=${pendingRestartReason ?? "unspecified"} ${formatRestartAudit(opts?.audit)}`,
);
clearActiveDeferralPolls();
pendingRestartReason = reason;
pendingRestartEmitHooks = opts?.emitHooks;
void emitPreparedGatewayRestart(undefined, reason);
return {
ok: true,
pid: process.pid,
signal: "SIGUSR1",
delayMs: 0,
reason,
mode,
coalesced: false,
cooldownMsApplied,
};
}
const shouldUpgradeToSkipDeferral = skipDeferral && !pendingRestartSkipDeferral;
const shouldPullEarlier =
!pendingRestartPreparing &&
(requestedDueAt < pendingRestartDueAt || shouldUpgradeToSkipDeferral);
if (shouldPullEarlier) {
restartLog.warn(
`restart request rescheduled earlier reason=${reason ?? "unspecified"} pendingReason=${pendingRestartReason ?? "unspecified"} oldDelayMs=${remainingMs} newDelayMs=${Math.max(0, requestedDueAt - nowMs)} ${formatRestartAudit(opts?.audit)}`,
@@ -705,6 +731,7 @@ export function scheduleGatewaySigusr1Restart(opts?: {
if (shouldPreferRestartReason(reason, pendingRestartReason)) {
pendingRestartReason = reason;
}
pendingRestartSkipDeferral = pendingRestartSkipDeferral || skipDeferral;
restartLog.warn(
`restart request coalesced (already scheduled) reason=${reason ?? "unspecified"} pendingReason=${pendingRestartReason ?? "unspecified"} delayMs=${remainingMs} ${formatRestartAudit(opts?.audit)}`,
);
@@ -725,15 +752,18 @@ export function scheduleGatewaySigusr1Restart(opts?: {
pendingRestartDueAt = requestedDueAt;
pendingRestartReason = reason;
pendingRestartEmitHooks = opts?.emitHooks;
pendingRestartSkipDeferral = skipDeferral;
pendingRestartTimer = setTimeout(
() => {
const scheduledReason = pendingRestartReason;
const scheduledSkipDeferral = pendingRestartSkipDeferral;
pendingRestartTimer = null;
pendingRestartDueAt = 0;
pendingRestartReason = undefined;
pendingRestartSkipDeferral = false;
pendingRestartPreparing = true;
const pendingCheck = preRestartCheck;
if (!pendingCheck) {
if (scheduledSkipDeferral || !pendingCheck) {
void emitPreparedGatewayRestart(undefined, scheduledReason);
return;
}

View File

@@ -566,7 +566,7 @@ describe("scripts/openclaw-cross-os-release-checks", () => {
});
});
it("accepts a successful packaged update followed by the old self-swapped process import miss", () => {
it("rejects a successful packaged update followed by an old self-swapped process import miss", () => {
expect(() =>
verifyPackagedUpgradeUpdateResult(
{
@@ -581,7 +581,7 @@ describe("scripts/openclaw-cross-os-release-checks", () => {
},
{ candidateVersion: "2026.4.27" },
),
).not.toThrow();
).toThrow(/Packaged upgrade failed/u);
});
it("rejects packaged update failures before the candidate package lands", () => {