fix(update-cli): capture macOS launchctl stderr to a log file instead of /dev/null

The macOS restart helper emitted by `openclaw update` (darwin branch of
`prepareRestartScript`) wrote the gateway restart script with every
`launchctl` stderr redirected to `/dev/null` and the final fallback
`kickstart` chained with `|| true`. When bootstrap/kickstart failed
(plist-on-disk race, schema rejection, stale job, bootout recovery
edge cases), the script exited 0, the updater declared success, and
the gateway silently stayed offline.

The reporter saw a ~25 minute production outage before noticing the
messages going unanswered across Telegram/Discord/Feishu.

Route stderr to `~/.openclaw/logs/update-restart.log` via `exec 2>>`,
drop `2>/dev/null` on every launchctl call, and remove the `|| true`
swallow on the fallback kickstart so a genuine failure exits non-zero
and leaves a durable audit trail. Log directory creation is best-effort
via `mkdir -p ... 2>/dev/null || true` since it normally already exists
from the gateway's own logging path. Self-cleanup of the script file
via `rm -f "$0"` is retained because the log, not the script, is the
useful artifact after the fact.

Adds a targeted regression test `captures macOS launchctl stderr to
~/.openclaw/logs/update-restart.log` alongside the existing darwin
restart-script test. The existing test's assertions about the
kickstart/enable/bootstrap fallback chain + self-cleanup all still pass.

Fixes #68486
This commit is contained in:
HCL
2026-04-18 17:20:15 +08:00
committed by Peter Steinberger
parent 90c1ab2cef
commit 4a870300dd
2 changed files with 41 additions and 6 deletions

View File

@@ -111,6 +111,28 @@ describe("restart-helper", () => {
await cleanupScript(scriptPath);
});
it("captures macOS launchctl stderr to ~/.openclaw/logs/update-restart.log (#68486)", async () => {
// Silent failure in macOS update restart helper: previously every
// launchctl call redirected stderr to /dev/null and the final kickstart
// was chained with `|| true`, so bootstrap/kickstart failures were
// invisible and the gateway stayed offline while the updater reported
// success. The script should now route stderr to a durable log file and
// stop swallowing the final exit code.
Object.defineProperty(process, "platform", { value: "darwin" });
process.getuid = () => 501;
const { scriptPath, content } = await prepareAndReadScript({
OPENCLAW_PROFILE: "default",
HOME: "/Users/testuser",
});
expect(content).toContain("exec 2>>'/Users/testuser/.openclaw/logs/update-restart.log'");
// Every launchctl call should allow stderr through now (no `2>/dev/null`)
// and the final kickstart must not swallow its exit code.
expect(content).not.toMatch(/launchctl[^\n]*2>\/dev\/null/);
expect(content).not.toMatch(/launchctl kickstart[^\n]*\|\| true/);
await cleanupScript(scriptPath);
});
it("uses OPENCLAW_LAUNCHD_LABEL override on macOS", async () => {
Object.defineProperty(process, "platform", { value: "darwin" });
process.getuid = () => 501;

View File

@@ -90,20 +90,33 @@ rm -f "$0"
const home = normalizeOptionalString(env.HOME) || process.env.HOME || os.homedir();
const plistPath = path.join(home, "Library", "LaunchAgents", `${label}.plist`);
const escapedPlistPath = shellEscape(plistPath);
const logPath = path.join(home, ".openclaw", "logs", "update-restart.log");
const escapedLogPath = shellEscape(logPath);
filename = `openclaw-restart-${timestamp}.sh`;
scriptContent = `#!/bin/sh
# Standalone restart script — survives parent process termination.
# Wait briefly to ensure file locks are released after update.
sleep 1
# Capture launchctl stderr so bootstrap/kickstart failures leave a durable
# audit trail. Without this, transient launchctl errors (plist-on-disk race,
# schema rejection, stale job) disappear into /dev/null and the updater
# reports success while the gateway is silently offline — see #68486.
mkdir -p "$(dirname '${escapedLogPath}')" 2>/dev/null || true
exec 2>>'${escapedLogPath}'
echo "[$(date -u +%FT%TZ)] openclaw update restart attempt (label=${escaped})" >&2
# Try kickstart first (works when the service is still registered).
# If it fails (e.g. after bootout), clear any persisted disabled state,
# then re-register via bootstrap and kickstart.
if ! launchctl kickstart -k 'gui/${uid}/${escaped}' 2>/dev/null; then
launchctl enable 'gui/${uid}/${escaped}' 2>/dev/null
launchctl bootstrap 'gui/${uid}' '${escapedPlistPath}' 2>/dev/null
launchctl kickstart -k 'gui/${uid}/${escaped}' 2>/dev/null || true
# then re-register via bootstrap and kickstart. Any step's stderr now
# lands in update-restart.log; the final kickstart no longer swallows
# its exit code with \`|| true\`, so a genuine failure exits non-zero and
# is inspectable.
if ! launchctl kickstart -k 'gui/${uid}/${escaped}'; then
launchctl enable 'gui/${uid}/${escaped}'
launchctl bootstrap 'gui/${uid}' '${escapedPlistPath}'
launchctl kickstart -k 'gui/${uid}/${escaped}'
fi
# Self-cleanup
echo "[$(date -u +%FT%TZ)] openclaw update restart done" >&2
# Self-cleanup (log is retained under ~/.openclaw/logs/update-restart.log).
rm -f "$0"
`;
} else if (platform === "win32") {