Route invalid-config recovery output for source-only installed plugin packages to plugin packaging guidance instead of openclaw doctor --fix.
Validated with focused config/CLI/gateway/plugin tests, autoreview, Crabbox/Testbox E2E tbx_01ksgr80tnvvc13kv6t126yv78, and green PR CI on 3b3ce73d0f.
Thanks @brokemac79.
Normalize Google Gemini 3.1 Flash Lite routing to the GA model id and keep the retired preview spelling as a compatibility alias. Align default alias docs, FAQ guidance, and deprecated-model manifest recommendations with the GA id.
Fixes#86151.
Co-authored-by: Sebastien Tardif <sebtardif@ncf.ca>
* fix(doctor): skip restart prompt when gateway is healthy after recent restart
`openclaw doctor` unconditionally prompted "Restart gateway service now?"
with default=Yes whenever the gateway was running, even if it had just
restarted via SIGUSR1 after an update. This caused restart loops on macOS
where the prompt raced with launchctl KeepAlive.
Changes:
- Probe gateway health before the restart prompt when a restart handoff
exists (deep doctor mode). If healthy, skip the prompt entirely.
- Change `initialValue` from `true` to `false` as a safety net so users
don't accidentally confirm a restart by pressing Enter.
- Update existing test that expected a single `readGatewayRestartHandoffSync`
call (now called twice: diagnostic display + health-probe check).
Fixes#86518
* fix(doctor): correct GatewayRestartHandoff mock types in tests
Add explicit literal types + satisfies constraint so the mock handoff
objects match the exact GatewayRestartHandoff type expected by the
type-check CI.
* fix(doctor): apply recent-restart skip to normal doctor flow
* test(doctor): align normal-flow handoff expectation
* chore: add doctor restart prompt changelog
---------
Co-authored-by: OpenClaw Contributor <openclaw-contributor@example.com>
Co-authored-by: liaoyl830 <267396060+liaoyl830@users.noreply.github.com>
Co-authored-by: sallyom <somalley@redhat.com>
* fix(sessions): stop doctor OOM on large session stores and reclaim stale store temps
`openclaw doctor` loaded the full sessions.json via loadSessionStore with the
default cache-write plus return clone, materializing a multi-hundred-MB
monolithic store several times and exhausting the heap (#56827). The read-only
doctor checks (state integrity, heartbeat target, codex route scan) now load
with { skipCache: true, clone: false } so the store is materialized once.
Orphaned session-store atomic-write temps were also never reclaimed: the store
write went through the generic atomic writer, staging a shared
.fs-safe-replace.<pid>.<uuid>.tmp not identifiable as a store temp. Give the
store write a store-specific tempPrefix so its temps stage as
sessions.json.<pid>.<uuid>.tmp, classify them (isSessionStoreTempArtifactName),
and reclaim stale ones via the disk-budget sweep and the unreferenced-artifact
prune on a short staleness window so in-flight temps are preserved.
Fixes#56827
* docs(changelog): note large session store doctor fix
* test(qa): preserve WhatsApp RTT source literal
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Catch non-ENOENT load failures inside maybeRepairLegacyCronStore so an
unreadable ~/.openclaw/cron/jobs.json (e.g. root-owned 0600 inside
Docker) no longer aborts the rest of the doctor health checks. The
scheduler-side loadCronStore keeps its strict throw-on-read-failure
contract.
Closes#86102
Co-authored-by: 1052326311 <1052326311@users.noreply.github.com>
* fix(doctor): repair stale contextWindow for DeepSeek V4 Flash
Problem:
- Older releases configured deepseek-v4-flash with contextWindow: 200000
- Official DeepSeek V4 Flash context window is 1,000,000 (1M)
- Users switching from smaller models see incorrect progress bar (e.g.,
50% instead of 10%) because stale config value overrides catalog
Fix:
- Add 'models.providers.*.models.*.contextWindow-stale' migration
- Detects deepseek-v4-flash models with 200K contextWindow
- Repairs to 1M to match catalog default
- Handles both bare and provider-prefixed model IDs
- 7 unit tests covering repair, passthrough, edge cases
Fixes: #85834
* fix(doctor): preserve custom DeepSeek context windows
* fix(doctor): detect stale DeepSeek context windows
* fix(doctor): scope DeepSeek context repair
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(doctor): skip empty entries and memoize routes in plugin session repairs
runPluginSessionStateDoctorRepairs called resolveConfiguredDoctorSessionStateRoute
once per session-store key, even for entries that carry no plugin route state
fields. On stores with many CLI sessions (observed ~800 entries), each call
takes ~1.5s due to resolveAgentHarnessPolicy walking config and provider
metadata, so the doctor's state-integrity contribution hangs for minutes
and the surrounding 'openclaw doctor' run effectively never completes.
scanEntryForOwner can only produce repair/manual-review findings when the
entry exposes one of the fields covered by entryMayContainPluginSessionRouteState
(providerOverride/modelOverride/agentHarnessId/cliSessionBindings/etc.), so
the route resolution for empty entries was pure waste. The route itself is
also a function of agentId (sessionKey is only used to derive agentId), so
sessions sharing an agent can reuse one resolved route.
Filter the store by entryMayContainPluginSessionRouteState before resolving,
and memoize resolveConfiguredDoctorSessionStateRoute by agentId within the
remaining entries. On the repro store this drops the contribution from
'never completes' to <100ms.
Adds a guard test that builds a 200-entry store with 2 route-state-carrying
entries and asserts (a) the repair fires exactly once on the codex owner
and (b) the run completes in under 2s (pre-fix would take >5 minutes).
* fix(doctor): skip manifest model-id normalization in plugin session repairs
After the previous filter+memoize fix, runPluginSessionStateDoctorRepairs was
still ~38s on a 230-entry store because every scanned entry calls parseModelRef
on its runtime model. That implicitly enters manifest-driven model-id
normalization via normalizeStaticProviderModelId, which calls
loadPluginMetadataSnapshot when no current snapshot is bound to process state.
loadPluginMetadataSnapshot is filesystem-heavy and is only memoized when a
'current' snapshot is bound (it is not, during doctor), so each parseModelRef
call paid ~40ms of fresh plugin-metadata loading. 672 calls × ~40ms = ~27s
of doctor wall-clock, all of it useless for doctor's purposes: the scan only
needs the normalized provider id of the configured runtime/route to compare
against an owner's providerIds, never the manifest-normalized model id.
Pass allowManifestNormalization: false alongside the existing
allowPluginNormalization: false on all three parseModelRef call sites in
this file. normalizeStaticProviderModelId short-circuits to
normalizeBuiltInProviderModelId when allowManifestNormalization is false,
which is what doctor wants here.
On the same 230-entry store doctor:state-integrity drops from ~38s to ~2.4s
and total openclaw doctor wall-clock drops from ~91s to ~56s.
* docs(auth): document named OAuth profile logins
* feat(auth): support --profile-id in models auth login
* docs: note named model login profiles
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(twitch): preserve newer message handler during cleanup
Fixes#83888.
`TwitchClientManager.onMessage` returns a cleanup closure that called
`messageHandlers.delete(key)` unconditionally. When a second onMessage()
for the same account replaced the handler, running the earlier cleanup
deleted the newer handler, leaving the account with no handler and
silently dropping all inbound messages.
Guard the delete with a referential check so the cleanup only removes
the handler it registered. Adds regression tests covering both the
stale-cleanup case (newer handler must survive) and the normal case
(current handler is still removed).
* fix(twitch): distinguish handler registrations
* fix(signal): avoid dangling test export name
* test(meeting-notes): use public sdk imports
* test(sdk): classify meeting-notes subpath
* fix(discord): keep channel entrypoint imports narrow
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(gateway): include openclaw bin in service PATH
* fix(doctor): accept expected service PATH
* docs(changelog): mention managed service PATH bin fix
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>