Address Codex P1 + Greptile P2:
- Move config validation before the restart attempt so invalid config
is caught in the stop→start path (not just the already-loaded path)
- Derive service.loaded from actual isLoaded() after restart instead
of hardcoded true
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: HCL <chenglunhu@gmail.com>
After `gateway stop` (which runs `launchctl bootout`), `gateway start`
checks `isLoaded` → false → prints "not loaded" hints and exits.
The service is never re-bootstrapped, so `start` cannot recover from
`stop` — only `gateway install` works.
Root cause: src/cli/daemon-cli/lifecycle-core.ts:208-217 — runServiceStart
calls handleServiceNotLoaded which only prints hints, never attempts
service.restart() (which already handles bootstrap via
bootstrapLaunchAgentOrThrow at launchd.ts:598).
Fix: when service is not loaded, attempt service.restart() first (which
handles re-bootstrapping on all platforms). If restart fails (e.g. plist
was deleted, not just booted out), fall back to the existing hints.
The restart path is already proven: restartLaunchAgent (launchd.ts:556)
handles "not loaded" via bootstrapLaunchAgentOrThrow. This fix routes
the start command through the same recovery path.
Closes#53878
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: HCL <chenglunhu@gmail.com>
Bug 1 (high): replace fixed sleep 1 with caller-PID polling in both
kickstart and start-after-exit handoff modes. The helper now waits until
kill -0 $caller_pid fails before issuing launchctl kickstart -k.
Bug 2 (medium): gate enable+bootstrap fallback on isLaunchctlNotLoaded().
Only attempt re-registration when kickstart -k fails because the job is
absent; all other kickstart failures now re-throw the original error.
Follows up on 3c0fd3dffe.
Fixes#43311, #43406, #43035, #43049
- Remove dead 'return false' in runServiceStart (Greptile)
- Include stack trace in run-loop crash guard error log (Greptile)
- Only catch startup errors on subsequent restarts, not initial start (Codex P1)
- Add JSDoc note about env var false positive edge case (Codex P1)
When 'openclaw gateway restart' is run with an invalid config, the new
process crashes on startup due to config validation failure. On macOS,
this causes Full Disk Access (TCC) permissions to be lost because the
respawned process has a different PID.
Add getConfigValidationError() helper and pre-flight config validation
in both runServiceRestart() and runServiceStart(). If config is invalid,
abort with a clear error message instead of crashing.
The config watcher's hot-reload path already had this guard
(handleInvalidSnapshot), but the CLI restart/start commands did not.
AI-assisted (OpenClaw agent, fully tested)
Address review feedback - when --json mode is used, the drift warning
was completely suppressed. Now it's included in the warnings array
of the DaemonActionResponse so programmatic consumers can surface it.
When the gateway token in config differs from the token embedded in the
service plist/unit file, restart will not apply the new token. This can
cause silent auth failures after OAuth token switches.
Changes:
- Add checkTokenDrift() to service-audit.ts
- Call it in runServiceRestart() before restarting
- Warn user with suggestion to run 'openclaw gateway install --force'
Closes#18018