fix(models): normalize provider runtime selection (#71259)

* fix(models): normalize provider runtime selection * fix(models): reverse codex-only runtime migration * fix(models): default runtime selection to pi * fix(status): label model runtime clearly * fix(status): align pi runtime label * fix(plugins): align tool result middleware runtime naming * fix(models): validate runtime overrides
2026-05-06 18:40:44 +00:00 · 2026-04-24 16:56:49 -07:00
parent 60e7b692cc
commit aa27e27f36
75 changed files with 1422 additions and 414 deletions
--- a/docs/.generated/plugin-sdk-api-baseline.sha256
+++ b/docs/.generated/plugin-sdk-api-baseline.sha256
@@ -1,2 +1,2 @@
-eb5c790aaa54be7b1380eb5a162db50dd314e052aedb5e608290092c33d999f2  plugin-sdk-api-baseline.json
-0d2fd80f69e0c3488b6bdbbbb035b08ab108637790d1f30b8e4f84c71c5bc8e2  plugin-sdk-api-baseline.jsonl
+f74435d49aa0af2509264d8581e12ffc624b1d6542d250d608ee5c3b41a234f3  plugin-sdk-api-baseline.json
+df33bbe47bb092ed11814576b5386253140f7aa6f8479a5334aff9b988125afc  plugin-sdk-api-baseline.jsonl
--- a/docs/channels/discord.md
+++ b/docs/channels/discord.md
@@ -305,7 +305,7 @@ By default, components are single use. Set `components.reusable=true` to allow b

 To restrict who can click a button, set `allowedUsers` on that button (Discord user IDs, tags, or `*`). When configured, unmatched users receive an ephemeral denial.

-The `/model` and `/models` slash commands open an interactive model picker with provider and model dropdowns plus a Submit step. `/models add` is deprecated and now returns a deprecation message instead of registering models from chat. The picker reply is ephemeral and only the invoking user can use it.
+The `/model` and `/models` slash commands open an interactive model picker with provider, model, and compatible runtime dropdowns plus a Submit step. `/models add` is deprecated and now returns a deprecation message instead of registering models from chat. The picker reply is ephemeral and only the invoking user can use it.

 File attachments:

--- a/docs/cli/status.md
+++ b/docs/cli/status.md
@@ -21,7 +21,7 @@ Notes:

 - `--deep` runs live probes (WhatsApp Web + Telegram + Discord + Slack + Signal).
 - `--usage` prints normalized provider usage windows as `X% left`.
- Session status output now separates `Runtime:` from `Runner:`. `Runtime` is the execution path and sandbox state (`direct`, `docker/*`), while `Runner` tells you whether the session is using embedded Pi, a CLI-backed provider, or an ACP harness backend such as `codex (acp/acpx)`.
+- Session status output separates `Execution:` from `Runtime:`. `Execution` is the sandbox path (`direct`, `docker/*`), while `Runtime` tells you whether the session is using `OpenClaw Pi Default`, `OpenAI Codex`, a CLI backend, or an ACP backend such as `codex (acp/acpx)`.
 - MiniMax's raw `usage_percent` / `usagePercent` fields are remaining quota, so OpenClaw inverts them before display; count-based fields win when present. `model_remains` responses prefer the chat-model entry, derive the window label from timestamps when needed, and include the model name in the plan label.
 - When the current session snapshot is sparse, `/status` can backfill token and cache counters from the most recent transcript usage log. Existing nonzero live values still win over transcript fallback values.
 - Transcript fallback can also recover the active runtime model label when the live session entry is missing it. If that transcript model differs from the selected model, status resolves the context window against the recovered runtime model instead of the selected one.
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -24,6 +24,12 @@ For model selection rules, see [/concepts/models](/concepts/models).
 - Plugin auto-enable follows that same boundary: `openai-codex/<model>` belongs
  to the OpenAI plugin, while the Codex plugin is enabled by
  `embeddedHarness.runtime: "codex"` or legacy `codex/<model>` refs.
+- CLI runtimes use the same split: choose canonical model refs such as
+  `anthropic/claude-*`, `google/gemini-*`, or `openai/gpt-*`, then set
+  `agents.defaults.embeddedHarness.runtime` to `claude-cli`,
+  `google-gemini-cli`, or `codex-cli` when you want a local CLI backend.
+  Legacy `claude-cli/*`, `google-gemini-cli/*`, and `codex-cli/*` refs migrate
+  back to canonical provider refs with the runtime recorded separately.
 - GPT-5.5 is currently available through subscription/OAuth routes:
  `openai-codex/gpt-5.5` in PI or `openai/gpt-5.5` with the Codex app-server
  harness. The direct API-key route for `openai/gpt-5.5` is supported once
--- a/docs/gateway/config-agents.md
+++ b/docs/gateway/config-agents.md
@@ -316,7 +316,7 @@ Time format in system prompt. Default: `auto` (OS preference).
      },
      params: { cacheRetention: "long" }, // global default provider params
      embeddedHarness: {
-        runtime: "auto", // auto | pi | registered harness id, e.g. codex
+        runtime: "pi", // pi | auto | registered harness id, e.g. codex
        fallback: "pi", // pi | none
      },
      pdfMaxBytesMb: 10,
@@ -369,14 +369,14 @@ Time format in system prompt. Default: `auto` (OS preference).
  - For direct OpenAI Responses models, server-side compaction is enabled automatically. Use `params.responsesServerCompaction: false` to stop injecting `context_management`, or `params.responsesCompactThreshold` to override the threshold. See [OpenAI server-side compaction](/providers/openai#server-side-compaction-responses-api).
 - `params`: global default provider parameters applied to all models. Set at `agents.defaults.params` (e.g. `{ cacheRetention: "long" }`).
 - `params` merge precedence (config): `agents.defaults.params` (global base) is overridden by `agents.defaults.models["provider/model"].params` (per-model), then `agents.list[].params` (matching agent id) overrides by key. See [Prompt Caching](/reference/prompt-caching) for details.
- `embeddedHarness`: default low-level embedded agent runtime policy. Use `runtime: "auto"` to let registered plugin harnesses claim supported models, `runtime: "pi"` to force the built-in PI harness, or a registered harness id such as `runtime: "codex"`. Automatic PI fallback defaults to `"pi"` only in `auto` mode. Explicit plugin runtimes such as `codex` default to `"none"` unless you set `fallback: "pi"`. New Codex harness configs should keep model refs canonical as `openai/*` and select the harness here rather than using legacy `codex/*` model refs.
+- `embeddedHarness`: default low-level embedded agent runtime policy. Omitted runtime defaults to OpenClaw Pi. Use `runtime: "pi"` to force the built-in PI harness, `runtime: "auto"` to let registered plugin harnesses claim supported models, or a registered harness id such as `runtime: "codex"`. Set `fallback: "none"` to disable automatic PI fallback. Keep model refs canonical as `provider/model`; select Codex, Claude CLI, Gemini CLI, and other execution backends through runtime config instead of legacy runtime provider prefixes.
 - Config writers that mutate these fields (for example `/models set`, `/models set-image`, and fallback add/remove commands) save canonical object form and preserve existing fallback lists when possible.
 - `maxConcurrent`: max parallel agent runs across sessions (each session still serialized). Default: 4.

 ### `agents.defaults.embeddedHarness`

 `embeddedHarness` controls which low-level executor runs embedded agent turns.
-Most deployments should keep the default `{ runtime: "auto", fallback: "pi" }`.
+Most deployments should keep the default OpenClaw Pi runtime.
 Use it when a trusted plugin provides a native harness, such as the bundled
 Codex app-server harness.

--- a/docs/plugins/building-plugins.md
+++ b/docs/plugins/building-plugins.md
@@ -172,8 +172,8 @@ For the full registration API, see [SDK Overview](/plugins/sdk-overview#registra

 Bundled plugins can use `api.registerAgentToolResultMiddleware(...)` when they
 need async tool-result rewriting before the model sees the output. Declare the
-targeted harnesses in `contracts.agentToolResultMiddleware`, for example
-`["pi", "codex-app-server"]`. This is a trusted bundled-plugin seam; external
+targeted runtimes in `contracts.agentToolResultMiddleware`, for example
+`["pi", "codex"]`. This is a trusted bundled-plugin seam; external
 plugins should prefer regular OpenClaw plugin hooks unless OpenClaw grows an
 explicit trust policy for this capability.

--- a/docs/plugins/codex-harness.md
+++ b/docs/plugins/codex-harness.md
@@ -25,7 +25,7 @@ These are in-process OpenClaw hooks, not Codex `hooks.json` command hooks:
 - `before_message_write` for mirrored transcript records
 - `agent_end`

-Plugins can also register harness-neutral tool-result middleware to rewrite
+Plugins can also register runtime-neutral tool-result middleware to rewrite
 OpenClaw dynamic tool results after OpenClaw executes the tool and before the
 result is returned to Codex. This is separate from the public
 `tool_result_persist` plugin hook, which transforms OpenClaw-owned transcript
@@ -35,8 +35,8 @@ The harness is off by default. New configs should keep OpenAI model refs
 canonical as `openai/gpt-*` and explicitly force
 `embeddedHarness.runtime: "codex"` or `OPENCLAW_AGENT_RUNTIME=codex` when they
 want native app-server execution. Legacy `codex/*` model refs still auto-select
-the harness for compatibility, but they are not shown as normal model/provider
-choices.
+the harness for compatibility, but runtime-backed legacy provider prefixes are
+not shown as normal model/provider choices.

 ## Pick the right model prefix

@@ -56,10 +56,12 @@ app-server harness. Direct API-key access for `openai/gpt-5.5` is supported
 once OpenAI enables GPT-5.5 on the public API.

 Legacy `codex/gpt-*` refs remain accepted as compatibility aliases. Doctor
-compatibility migration rewrites legacy primary `codex/*` refs to `openai/*`
-and records the Codex harness policy separately. New PI Codex OAuth configs
-should use `openai-codex/gpt-*`; new native app-server harness configs should
-use `openai/gpt-*` plus `embeddedHarness.runtime: "codex"`.
+compatibility migration rewrites legacy primary runtime refs to canonical model
+refs and records the runtime policy separately, while fallback-only legacy refs
+are left unchanged because runtime is configured for the whole agent container.
+New PI Codex OAuth configs should use `openai-codex/gpt-*`; new native
+app-server harness configs should use `openai/gpt-*` plus
+`embeddedHarness.runtime: "codex"`.

 `agents.defaults.imageModel` follows the same prefix split. Use
 `openai-codex/gpt-*` when image understanding should run through the OpenAI
@@ -86,9 +88,9 @@ Legacy sessions created before harness pins are treated as PI-pinned once they
 have transcript history. Use `/new` or `/reset` to opt that conversation into
 Codex after changing config.

-`/status` shows the effective non-PI harness next to `Fast`, for example
-`Fast · codex`. The default PI harness remains `Runner: pi (embedded)` and does
-not add a separate harness badge.
+`/status` shows the effective model runtime. The default PI harness appears as
+`Runtime: OpenClaw Pi Default`, and the Codex app-server harness appears as
+`Runtime: OpenAI Codex`.

 ## Requirements

--- a/docs/plugins/manifest.md
+++ b/docs/plugins/manifest.md
@@ -406,7 +406,7 @@ read without importing the plugin runtime.
 ```json
 {
  "contracts": {
-    "agentToolResultMiddleware": ["pi", "codex-app-server"],
+    "agentToolResultMiddleware": ["pi", "codex"],
    "externalAuthProviders": ["acme-ai"],
    "speechProviders": ["openai"],
    "realtimeTranscriptionProviders": ["openai"],
@@ -427,7 +427,7 @@ Each list is optional:
 | Field                            | Type       | What it means                                                         |
 | -------------------------------- | ---------- | --------------------------------------------------------------------- |
 | `embeddedExtensionFactories`     | `string[]` | Deprecated embedded extension factory ids.                            |
-| `agentToolResultMiddleware`      | `string[]` | Harness ids a bundled plugin may register tool-result middleware for. |
+| `agentToolResultMiddleware`      | `string[]` | Runtime ids a bundled plugin may register tool-result middleware for. |
 | `externalAuthProviders`          | `string[]` | Provider ids whose external auth profile hook this plugin owns.       |
 | `speechProviders`                | `string[]` | Speech provider ids this plugin owns.                                 |
 | `realtimeTranscriptionProviders` | `string[]` | Realtime-transcription provider ids this plugin owns.                 |
--- a/docs/plugins/sdk-agent-harness.md
+++ b/docs/plugins/sdk-agent-harness.md
@@ -146,15 +146,15 @@ OpenClaw only runs against the protocol surface it has been tested with.

 ### Tool-result middleware

-Bundled plugins can attach harness-neutral tool-result middleware through
+Bundled plugins can attach runtime-neutral tool-result middleware through
 `api.registerAgentToolResultMiddleware(...)` when their manifest declares the
-targeted harness ids in `contracts.agentToolResultMiddleware`. This trusted
+targeted runtime ids in `contracts.agentToolResultMiddleware`. This trusted
 seam is for async tool-result transforms that must run before PI or Codex feeds
 tool output back into the model.

 Legacy bundled plugins can still use
 `api.registerCodexAppServerExtensionFactory(...)` for Codex app-server-only
-middleware, but new result transforms should use the harness-neutral API.
+middleware, but new result transforms should use the runtime-neutral API.
 The Pi-only `api.registerEmbeddedExtensionFactory(...)` hook is deprecated for
 tool-result transforms; keep it only for bundled compatibility code that still
 needs direct Pi embedded-runner events.
--- a/docs/plugins/sdk-migration.md
+++ b/docs/plugins/sdk-migration.md
@@ -93,7 +93,7 @@ releases.
  <Step title="Migrate Pi tool-result extensions to middleware">
    Bundled plugins should replace Pi-only
    `api.registerEmbeddedExtensionFactory(...)` tool-result handlers with
-    harness-neutral middleware.
+    runtime-neutral middleware.

    ```typescript
    // Before: Pi-only compatibility hook
@@ -103,11 +103,11 @@ releases.
      });
    });

-    // After: Pi and Codex app-server dynamic tools
+    // After: Pi and Codex runtime dynamic tools
    api.registerAgentToolResultMiddleware(async (event) => {
      return compactToolResult(event);
    }, {
-      harnesses: ["pi", "codex-app-server"],
+      runtimes: ["pi", "codex"],
    });
    ```

@@ -116,7 +116,7 @@ releases.
    ```json
    {
      "contracts": {
-        "agentToolResultMiddleware": ["pi", "codex-app-server"]
+        "agentToolResultMiddleware": ["pi", "codex"]
      }
    }
    ```
@@ -626,7 +626,7 @@ canonical replacement.
    Covered in "How to migrate → Migrate Pi tool-result extensions to
    middleware" above. Included here for completeness: the Pi-only
    `api.registerEmbeddedExtensionFactory(...)` path is deprecated in favor of
-    `api.registerAgentToolResultMiddleware(...)` with an explicit harness
+    `api.registerAgentToolResultMiddleware(...)` with an explicit runtime
    list in `contracts.agentToolResultMiddleware`.
  </Accordion>

--- a/docs/plugins/sdk-overview.md
+++ b/docs/plugins/sdk-overview.md
@@ -99,7 +99,7 @@ methods:
 | `api.registerCli(registrar, opts?)`             | CLI subcommand                          |
 | `api.registerService(service)`                  | Background service                      |
 | `api.registerInteractiveHandler(registration)`  | Interactive handler                     |
-| `api.registerAgentToolResultMiddleware(...)`    | Harness tool-result middleware          |
+| `api.registerAgentToolResultMiddleware(...)`    | Runtime tool-result middleware          |
 | `api.registerEmbeddedExtensionFactory(factory)` | Deprecated PI extension factory         |
 | `api.registerMemoryPromptSupplement(builder)`   | Additive memory-adjacent prompt section |
 | `api.registerMemoryCorpusSupplement(adapter)`   | Additive memory search/read corpus      |
@@ -113,12 +113,12 @@ methods:

 <Accordion title="When to use tool-result middleware">
  Bundled plugins can use `api.registerAgentToolResultMiddleware(...)` when
-  they need to rewrite a tool result after execution and before the harness
-  feeds that result back into the model. This is the trusted harness-neutral
+  they need to rewrite a tool result after execution and before the runtime
+  feeds that result back into the model. This is the trusted runtime-neutral
  seam for async output reducers such as tokenjuice.

 Bundled plugins must declare `contracts.agentToolResultMiddleware` for each
-targeted harness, for example `["pi", "codex-app-server"]`. External plugins
+targeted runtime, for example `["pi", "codex"]`. External plugins
 cannot register this middleware; keep normal OpenClaw plugin hooks for work
 that does not need pre-model tool-result timing.
 </Accordion>
--- a/docs/providers/google.md
+++ b/docs/providers/google.md
@@ -13,7 +13,8 @@ Gemini Grounding.
 - Provider: `google`
 - Auth: `GEMINI_API_KEY` or `GOOGLE_API_KEY`
 - API: Google Gemini API
- Alternative provider: `google-gemini-cli` (OAuth)
+- Runtime option: `agents.defaults.embeddedHarness.runtime: "google-gemini-cli"`
+  reuses Gemini CLI OAuth while keeping model refs canonical as `google/*`.

 ## Getting started

@@ -92,12 +93,13 @@ Choose your preferred auth method and follow the setup steps.
      </Step>
      <Step title="Verify the model is available">
        ```bash
-        openclaw models list --provider google-gemini-cli
+        openclaw models list --provider google
        ```
      </Step>
    </Steps>

-    - Default model: `google-gemini-cli/gemini-3-flash-preview`
+    - Default model: `google/gemini-3.1-pro-preview`
+    - Runtime: `google-gemini-cli`
    - Alias: `gemini-cli`

    **Environment variables:**
@@ -117,9 +119,9 @@ Choose your preferred auth method and follow the setup steps.
    command is installed and on `PATH`.
    </Note>

-    The OAuth-only `google-gemini-cli` provider is a separate text-inference
-    surface. Image generation, media understanding, and Gemini Grounding stay on
-    the `google` provider id.
+    `google-gemini-cli/*` model refs are legacy compatibility aliases. New
+    configs should use `google/*` model refs plus the `google-gemini-cli`
+    runtime when they want local Gemini CLI execution.

  </Tab>
 </Tabs>
--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@@ -174,11 +174,10 @@ Choose your preferred auth method and follow the setup steps.

    ### Status indicator

-    Chat `/status` shows which embedded harness is active for the current
-    session. The default PI harness appears as `Runner: pi (embedded)` and does
-    not add a separate badge. When the bundled Codex app-server harness is
-    selected, `/status` appends the non-PI harness id next to `Fast`, for example
-    `Fast · codex`. Existing sessions keep their recorded harness id, so use
+    Chat `/status` shows which model runtime is active for the current session.
+    The default PI harness appears as `Runtime: OpenClaw Pi Default`. When the
+    bundled Codex app-server harness is selected, `/status` shows
+    `Runtime: OpenAI Codex`. Existing sessions keep their recorded harness id, so use
    `/new` or `/reset` after changing `embeddedHarness` if you want `/status` to
    reflect a new PI/Codex choice.

--- a/docs/tools/slash-commands.md
+++ b/docs/tools/slash-commands.md
@@ -106,7 +106,7 @@ Built-in commands available today:
 - `/help` shows the short help summary.
 - `/commands` shows the generated command catalog.
 - `/tools [compact|verbose]` shows what the current agent can use right now.
- `/status` shows runtime status, including `Runtime`/`Runner` labels and provider usage/quota when available.
+- `/status` shows execution/runtime status, including `Execution`/`Runtime` labels and provider usage/quota when available.
 - `/tasks` lists active/recent background tasks for the current session.
 - `/context [list|detail|json]` explains how context is assembled.
 - `/export-session [path]` exports the current session to HTML. Alias: `/export`.
@@ -227,7 +227,7 @@ of treating `/tools` as a static catalog.

 - **Provider usage/quota** (example: “Claude 80% left”) shows up in `/status` for the current model provider when usage tracking is enabled. OpenClaw normalizes provider windows to `% left`; for MiniMax, remaining-only percent fields are inverted before display, and `model_remains` responses prefer the chat-model entry plus a model-tagged plan label.
 - **Token/cache lines** in `/status` can fall back to the latest transcript usage entry when the live session snapshot is sparse. Existing nonzero live values still win, and transcript fallback can also recover the active runtime model label plus a larger prompt-oriented total when stored totals are missing or smaller.
- **Runtime vs runner:** `/status` reports `Runtime` for the effective execution path and sandbox state, and `Runner` for who is actually running the session: embedded Pi, a CLI-backed provider, or an ACP harness/backend.
+- **Execution vs runtime:** `/status` reports `Execution` for the effective sandbox path and `Runtime` for who is actually running the session: `OpenClaw Pi Default`, `OpenAI Codex`, a CLI backend, or an ACP backend.
 - **Per-response tokens/cost** is controlled by `/usage off|tokens|full` (appended to normal replies).
 - `/model status` is about **models/auth/endpoints**, not usage.