fix(doctor): repair legacy Codex route config

Repair legacy openai-codex route config and session pins safely.
2026-05-06 16:20:43 +00:00 · 2026-05-05 17:42:41 -07:00
parent 8fb797c2c6
commit bca6709203
10 changed files with 1401 additions and 128 deletions
--- a/docs/cli/doctor.md
+++ b/docs/cli/doctor.md
@@ -46,6 +46,7 @@ Notes:
 - Doctor also scans `~/.openclaw/cron/jobs.json` (or `cron.store`) for legacy cron job shapes and can rewrite them in place before the scheduler has to auto-normalize them at runtime.
 - On Linux, doctor warns when the user's crontab still runs legacy `~/.openclaw/bin/ensure-whatsapp.sh`; that script is no longer maintained and can log false WhatsApp gateway outages when cron lacks the systemd user-bus environment.
 - When WhatsApp is enabled, doctor checks for a degraded Gateway event loop with local `openclaw-tui` clients still running. `doctor --fix` stops only verified local TUI clients so WhatsApp replies are not queued behind stale TUI refresh loops.
+- Doctor rewrites legacy `openai-codex/*` model refs to canonical `openai/*` refs across primary models, fallbacks, heartbeat/subagent/compaction overrides, hooks, channel model overrides, and stale session route pins. `--fix` selects `agentRuntime.id: "codex"` only when the Codex plugin is installed, enabled, contributes the `codex` harness, and has usable OAuth; otherwise it selects `agentRuntime.id: "pi"` so the route stays on the default OpenClaw runner.
 - Doctor cleans legacy plugin dependency staging state created by older OpenClaw versions. It also repairs missing downloadable plugins that are referenced by config, such as `plugins.entries`, configured channels, configured provider/search settings, or configured agent runtimes. During package updates, doctor skips package-manager plugin repair until the package swap is complete; rerun `openclaw doctor --fix` afterward if a configured plugin still needs recovery. If the download fails, doctor reports the install error and preserves the configured plugin entry for the next repair attempt.
 - Doctor repairs stale plugin config by removing missing plugin ids from `plugins.allow`/`plugins.entries`, plus matching dangling channel config, heartbeat targets, and channel model overrides when plugin discovery is healthy.
 - Doctor quarantines invalid plugin config by disabling the affected `plugins.entries.<id>` entry and removing its invalid `config` payload. Gateway startup already skips only that bad plugin so other plugins and channels can keep running.
--- a/docs/gateway/doctor.md
+++ b/docs/gateway/doctor.md
@@ -108,6 +108,7 @@ cat ~/.openclaw/openclaw.json
    - Gateway runtime checks (service installed but not running; cached launchd label).
    - Channel status warnings (probed from the running gateway).
    - WhatsApp responsiveness checks for degraded Gateway event-loop health with local TUI clients still running; `--fix` stops only verified local TUI clients.
+    - Codex route repair for legacy `openai-codex/*` model refs in primary models, fallbacks, heartbeat/subagent/compaction overrides, hooks, channel model overrides, and session route pins; `--fix` rewrites them to `openai/*` and selects `agentRuntime.id: "codex"` only when the Codex plugin is installed, enabled, contributes the `codex` harness, and has usable OAuth. Otherwise it selects `agentRuntime.id: "pi"`.
    - Supervisor config audit (launchd/systemd/schtasks) with optional repair.
    - Embedded proxy environment cleanup for gateway services that captured shell `HTTP_PROXY` / `HTTPS_PROXY` / `NO_PROXY` values during install or update.
    - Gateway runtime best-practice checks (Node vs Bun, version-manager paths).
@@ -260,21 +261,22 @@ That stages grounded durable candidates into the short-term dreaming store while
  <Accordion title="2e. Codex OAuth provider overrides">
    If you previously added legacy OpenAI transport settings under `models.providers.openai-codex`, they can shadow the built-in Codex OAuth provider path that newer releases use automatically. Doctor warns when it sees those old transport settings alongside Codex OAuth so you can remove or rewrite the stale transport override and get the built-in routing/fallback behavior back. Custom proxies and header-only overrides are still supported and do not trigger this warning.
  </Accordion>
-  <Accordion title="2f. Codex plugin route warnings">
-    When the bundled Codex plugin is enabled, doctor also checks whether `openai-codex/*` primary model refs still resolve through the default PI runner. That combination is valid when you want Codex OAuth/subscription auth through PI, but it is easy to confuse with the native Codex app-server harness. Doctor warns and points to the explicit app-server shape: `openai/*` plus `agentRuntime.id: "codex"` or `OPENCLAW_AGENT_RUNTIME=codex`.
+  <Accordion title="2f. Codex route repair">
+    Doctor checks for legacy `openai-codex/*` model refs. Native Codex harness routing uses canonical `openai/*` model refs plus `agentRuntime.id: "codex"` so the turn goes through the Codex app-server harness instead of the OpenClaw PI OpenAI path.

-    Doctor does not repair this automatically because both routes are valid:
+    In `--fix` / `--repair` mode, doctor rewrites affected default-agent and per-agent refs, including primary models, fallbacks, heartbeat/subagent/compaction overrides, hooks, channel model overrides, and stale persisted session route state:

-    - `openai-codex/*` + PI means "use Codex OAuth/subscription auth through the normal OpenClaw runner."
-    - `openai/*` + `agentRuntime.id: "codex"` means "run the embedded turn through native Codex app-server."
+    - `openai-codex/gpt-*` becomes `openai/gpt-*`.
+    - The matching agent runtime becomes `agentRuntime.id: "codex"` only when Codex is installed, enabled, contributes the `codex` harness, and has usable OAuth.
+    - Otherwise the matching agent runtime becomes `agentRuntime.id: "pi"`.
+    - Existing model fallback lists are preserved with their legacy entries rewritten; copied per-model settings move from the legacy key to the canonical `openai/*` key.
+    - Persisted session `modelProvider`/`providerOverride`, `model`/`modelOverride`, fallback notices, auth-profile pins, and Codex harness pins are repaired across all discovered agent session stores.
    - `/codex ...` means "control or bind a native Codex conversation from chat."
    - `/acp ...` or `runtime: "acp"` means "use the external ACP/acpx adapter."

-    If the warning appears, choose the route you intended and edit config manually. Keep the warning as-is when PI Codex OAuth is intentional.
-
  </Accordion>
  <Accordion title="2g. Session route cleanup">
-    Doctor also scans the active sessions store for stale auto-created route state after you move the configured default/fallback model or runtime away from a plugin-owned route such as Codex.
+    Doctor also scans discovered agent session stores for stale auto-created route state after you move configured models or runtime away from a plugin-owned route such as Codex.

    `openclaw doctor --fix` can clear auto-created stale state such as `modelOverrideSource: "auto"` model pins, runtime model metadata, pinned harness ids, CLI session bindings, and auto auth-profile overrides when their owning route is no longer configured. Explicit user or legacy session model choices are reported for manual review and left untouched; switch them with `/model ...`, `/new`, or reset the session when that route is no longer intended.

--- a/docs/plugins/codex-harness.md
+++ b/docs/plugins/codex-harness.md
@@ -87,9 +87,10 @@ If your config uses `plugins.allow`, include `codex` there too:
 }
 ```

-Do not use `openai-codex/gpt-*` when you mean native Codex runtime. That prefix
-is the explicit "Codex OAuth through PI" route. Config changes apply to new or
-reset sessions; existing sessions keep their recorded runtime.
+Do not use `openai-codex/gpt-*` in config. That prefix is a legacy route that
+`openclaw doctor --fix` rewrites to `openai/gpt-*` across primary models,
+fallbacks, heartbeat/subagent/compaction overrides, hooks, channel overrides,
+and stale persisted session route pins.

 ## What this plugin changes

@@ -106,7 +107,9 @@ The bundled `codex` plugin contributes several separate capabilities:
 Enabling the plugin makes those capabilities available. It does **not**:

 - start using Codex for every OpenAI model
- convert `openai-codex/*` model refs into the native runtime
+- convert `openai-codex/*` model refs into the native runtime without doctor
+  verifying that Codex is installed, enabled, contributes the `codex` harness,
+  and is OAuth-ready
 - make ACP/acpx the default Codex path
 - hot-switch existing sessions that already recorded a PI runtime
 - replace OpenClaw channel delivery, session files, auth-profile storage, or
@@ -145,10 +148,10 @@ want native app-server execution. Legacy `codex/*` model refs still auto-select
 the harness for compatibility, but runtime-backed legacy provider prefixes are
 not shown as normal model/provider choices.

-If the `codex` plugin is enabled but the primary model is still
-`openai-codex/*`, `openclaw doctor` warns instead of changing the route. That is
-intentional: `openai-codex/*` remains the PI Codex OAuth/subscription path, and
-native app-server execution stays an explicit runtime choice.
+If any configured model route is still `openai-codex/*`, `openclaw doctor --fix`
+rewrites it to `openai/*`. For matching agent routes, it sets the agent runtime
+to `codex` only when the Codex plugin is installed, enabled, contributes the
+`codex` harness, and has usable OAuth; otherwise it sets the runtime to `pi`.

 ## Route map

@@ -158,15 +161,18 @@ Use this table before changing config:
 | ---------------------------------------------------- | -------------------------- | -------------------------------------- | ---------------------------- | ------------------------------ |
 | ChatGPT/Codex subscription with native Codex runtime | `openai/gpt-*`             | `agentRuntime.id: "codex"`             | Codex OAuth or Codex account | `Runtime: OpenAI Codex`        |
 | OpenAI API through normal OpenClaw runner            | `openai/gpt-*`             | omitted or `runtime: "pi"`             | OpenAI API key               | `Runtime: OpenClaw Pi Default` |
-| ChatGPT/Codex subscription through PI                | `openai-codex/gpt-*`       | omitted or `runtime: "pi"`             | OpenAI Codex OAuth provider  | `Runtime: OpenClaw Pi Default` |
+| Legacy config that needs doctor repair               | `openai-codex/gpt-*`       | repaired to `codex` or `pi`            | Existing configured auth     | Recheck after `doctor --fix`   |
 | Mixed providers with conservative auto mode          | provider-specific refs     | `agentRuntime.id: "auto"`              | Per selected provider        | Depends on selected runtime    |
 | Explicit Codex ACP adapter session                   | ACP prompt/model dependent | `sessions_spawn` with `runtime: "acp"` | ACP backend auth             | ACP task/session status        |

 The important split is provider versus runtime:

- `openai-codex/*` answers "which provider/auth route should PI use?"
- `agentRuntime.id: "codex"` answers "which loop should execute this
-  embedded turn?"
+- `openai-codex/*` is a legacy route that doctor rewrites.
+- `agentRuntime.id: "codex"` requires the Codex harness and fails closed if it
+  is unavailable.
+- `agentRuntime.id: "auto"` lets registered harnesses claim matching provider
+  routes, but canonical OpenAI refs are still PI-owned unless a harness supports
+  that provider/model pair.
 - `/codex ...` answers "which native Codex conversation should this chat bind
  or control?"
 - ACP answers "which external harness process should acpx launch?"
@@ -175,33 +181,30 @@ The important split is provider versus runtime:

 OpenAI-family routes are prefix-specific. For the common subscription plus
 native Codex runtime setup, use `openai/*` with `agentRuntime.id: "codex"`.
-Use `openai-codex/*` only when you intentionally want Codex OAuth through PI:
+Treat `openai-codex/*` as legacy config that doctor should rewrite:

 | Model ref                                     | Runtime path                                 | Use when                                                                  |
 | --------------------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------- |
 | `openai/gpt-5.4`                              | OpenAI provider through OpenClaw/PI plumbing | You want current direct OpenAI Platform API access with `OPENAI_API_KEY`. |
-| `openai-codex/gpt-5.5`                        | OpenAI Codex OAuth through OpenClaw/PI       | You want ChatGPT/Codex subscription auth with the default PI runner.      |
+| `openai-codex/gpt-5.5`                        | Legacy route repaired by doctor              | You are on old config; run `openclaw doctor --fix` to rewrite it.         |
 | `openai/gpt-5.5` + `agentRuntime.id: "codex"` | Codex app-server harness                     | You want ChatGPT/Codex subscription auth with native Codex execution.     |

 GPT-5.5 can appear on both direct OpenAI API-key and Codex subscription routes
 when your account exposes them. Use `openai/gpt-5.5` with the Codex app-server
-harness for native Codex runtime, `openai-codex/gpt-5.5` for PI OAuth, or
-`openai/gpt-5.5` without a Codex runtime override for direct API-key traffic.
+harness for native Codex runtime, or `openai/gpt-5.5` without a Codex runtime
+override for direct API-key traffic.

 Legacy `codex/gpt-*` refs remain accepted as compatibility aliases. Doctor
-compatibility migration rewrites legacy primary runtime refs to canonical model
-refs and records the runtime policy separately, while fallback-only legacy refs
-are left unchanged because runtime is configured for the whole agent container.
-New PI Codex OAuth configs should use `openai-codex/gpt-*`; new native
-app-server harness configs should use `openai/gpt-*` plus
-`agentRuntime.id: "codex"`.
+compatibility migration rewrites legacy runtime refs to canonical model refs
+and records the runtime policy separately. New native app-server harness configs
+should use `openai/gpt-*` plus `agentRuntime.id: "codex"`.

 `agents.defaults.imageModel` follows the same prefix split. Use
-`openai-codex/gpt-*` when image understanding should run through the OpenAI
-Codex OAuth provider path. Use `codex/gpt-*` when image understanding should run
-through a bounded Codex app-server turn. The Codex app-server model must
-advertise image input support; text-only Codex models fail before the media turn
-starts.
+`openai/gpt-*` for the normal OpenAI route and `codex/gpt-*` when image
+understanding should run through a bounded Codex app-server turn. Do not use
+`openai-codex/gpt-*`; doctor rewrites that legacy prefix to `openai/gpt-*`. The
+Codex app-server model must advertise image input support; text-only Codex
+models fail before the media turn starts.

 Use `/status` to confirm the effective harness for the current session. If the
 selection is surprising, enable debug logging for the `agents/harness` subsystem
@@ -211,22 +214,20 @@ in `auto` mode, each plugin candidate's support result.

 ### What doctor warnings mean

-`openclaw doctor` warns when all of these are true:
+`openclaw doctor` warns when configured model refs or persisted session route
+state still use `openai-codex/*`. `openclaw doctor --fix` rewrites those routes
+to:

- the bundled `codex` plugin is enabled or allowed
- an agent's primary model is `openai-codex/*`
- that agent's effective runtime is not `codex`
+- `openai/<model>`
+- `agentRuntime.id: "codex"` when Codex is installed, enabled, contributes the
+  `codex` harness, and has usable OAuth
+- `agentRuntime.id: "pi"` otherwise

-That warning exists because users often expect "Codex plugin enabled" to imply
-"native Codex app-server runtime." OpenClaw does not make that leap. The warning
-means:
-
- **No change is required** if you intended ChatGPT/Codex OAuth through PI.
- Change the model to `openai/<model>` and set
-  `agentRuntime.id: "codex"` if you intended native app-server
-  execution.
- Existing sessions still need `/new` or `/reset` after a runtime change,
-  because session runtime pins are sticky.
+The `codex` route forces the native Codex harness. The `pi` route keeps the
+agent on the default OpenClaw runner instead of enabling or installing Codex as
+a side effect of legacy-route cleanup.
+Doctor also repairs stale persisted session pins across discovered agent session
+stores so old conversations do not stay wedged on the removed route.

 Harness selection is not a live session control. When an embedded turn runs,
 OpenClaw records the selected harness id on that session and keeps using it for
@@ -349,7 +350,7 @@ Agents should route user requests by intent, not by the word "Codex" alone:
 | "File a support report for a bad Codex run"            | `/diagnostics [note]`                            |
 | "Only send Codex feedback for this attached thread"    | `/codex diagnostics [note]`                      |
 | "Use my ChatGPT/Codex subscription with Codex runtime" | `openai/*` plus `agentRuntime.id: "codex"`       |
-| "Use my ChatGPT/Codex subscription through PI"         | `openai-codex/*` model refs                      |
+| "Repair old `openai-codex/*` config/session pins"      | `openclaw doctor --fix`                          |
 | "Run Codex through ACP/acpx"                           | ACP `sessions_spawn({ runtime: "acp", ... })`    |
 | "Start Claude Code/Gemini/OpenCode/Cursor in a thread" | ACP/acpx, not `/codex` and not native sub-agents |