fix: centralize provider thinking profiles

2026-05-06 13:50:49 +00:00 · 2026-04-21 09:04:37 +01:00
parent 1cc2fc82ca
commit f1805ab54d
57 changed files with 718 additions and 572 deletions
--- a/docs/channels/slack.md
+++ b/docs/channels/slack.md
@@ -327,7 +327,7 @@ Surface different features that extend the above defaults.
      {
        "command": "/think",
        "description": "Set the thinking level",
-        "usage_hint": "<off|minimal|low|medium|high|xhigh>"
+        "usage_hint": "<level>"
      },
      {
        "command": "/verbose",
@@ -448,7 +448,7 @@ Surface different features that extend the above defaults.
      {
        "command": "/think",
        "description": "Set the thinking level",
-        "usage_hint": "<off|minimal|low|medium|high|xhigh>",
+        "usage_hint": "<level>",
        "url": "https://gateway-host.example.com/slack/events"
      },
      {
--- a/docs/cli/agent.md
+++ b/docs/cli/agent.md
@@ -26,7 +26,7 @@ Related:
 - `-t, --to <dest>`: recipient used to derive the session key
 - `--session-id <id>`: explicit session id
 - `--agent <id>`: agent id; overrides routing bindings
- `--thinking <off|minimal|low|medium|high|xhigh>`: agent thinking level
+- `--thinking <level>`: agent thinking level (`off`, `minimal`, `low`, `medium`, `high`, plus provider-supported custom levels such as `xhigh`, `adaptive`, or `max`)
 - `--verbose <on|off>`: persist verbose level for the session
 - `--channel <channel>`: delivery channel; omit to use the main session channel
 - `--reply-to <target>`: delivery target override
--- a/docs/cli/index.md
+++ b/docs/cli/index.md
@@ -994,7 +994,7 @@ Options:
 - `-t, --to <dest>` (for session key and optional delivery)
 - `--session-id <id>`
 - `--agent <id>` (agent id; overrides routing bindings)
- `--thinking <off|minimal|low|medium|high|xhigh>` (provider support varies; not model-gated at CLI level)
+- `--thinking <level>` (validated against the selected model's provider profile)
 - `--verbose <on|off>`
 - `--channel <channel>` (delivery channel; omit to use the main session channel)
 - `--reply-to <target>` (delivery target override, separate from session routing)
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -42,9 +42,9 @@ For model selection rules, see [/concepts/models](/concepts/models).
  `buildAuthDoctorHint`,
  `matchesContextOverflowError`, `classifyFailoverReason`,
  `isCacheTtlEligible`, `buildMissingAuthMessage`, `suppressBuiltInModel`,
-  `augmentModelCatalog`, `isBinaryThinking`, `supportsXHighThinking`,
-  `supportsAdaptiveThinking`, `supportsMaxThinking`,
-  `resolveDefaultThinkingLevel`, `applyConfigDefaults`, `isModernModelRef`,
+  `augmentModelCatalog`, `resolveThinkingProfile`, `isBinaryThinking`,
+  `supportsXHighThinking`, `resolveDefaultThinkingLevel`,
+  `applyConfigDefaults`, `isModernModelRef`,
  `prepareRuntimeAuth`, `resolveUsageAuth`, `fetchUsageSnapshot`, and
  `onModelSelected`.
 - Note: provider runtime `capabilities` is shared runner metadata (provider
@@ -132,12 +132,11 @@ Typical split:
  vendor-owned error for direct resolution failures
 - `augmentModelCatalog`: provider appends synthetic/final catalog rows after
  discovery and config merging
- `isBinaryThinking`: provider owns binary on/off thinking UX
- `supportsXHighThinking`: provider opts selected models into `xhigh`
- `supportsAdaptiveThinking`: provider opts selected models into `adaptive`
- `supportsMaxThinking`: provider opts selected models into `max`
- `resolveDefaultThinkingLevel`: provider owns default `/think` policy for a
-  model family
+- `resolveThinkingProfile`: provider owns the exact `/think` level set,
+  optional display labels, and default level for a selected model
+- `isBinaryThinking`: compatibility hook for binary on/off thinking UX
+- `supportsXHighThinking`: compatibility hook for selected `xhigh` models
+- `resolveDefaultThinkingLevel`: compatibility hook for default `/think` policy
 - `applyConfigDefaults`: provider applies provider-specific global defaults
  during config materialization based on auth mode, env, or model family
 - `isModernModelRef`: provider owns live/smoke preferred-model matching
--- a/docs/plugins/architecture.md
+++ b/docs/plugins/architecture.md
@@ -658,8 +658,7 @@ Provider plugins now have two layers:
  `buildAuthDoctorHint`, `matchesContextOverflowError`,
  `classifyFailoverReason`, `isCacheTtlEligible`,
  `buildMissingAuthMessage`, `suppressBuiltInModel`, `augmentModelCatalog`,
-  `isBinaryThinking`, `supportsXHighThinking`, `supportsAdaptiveThinking`,
-  `supportsMaxThinking`,
+  `resolveThinkingProfile`, `isBinaryThinking`, `supportsXHighThinking`,
  `resolveDefaultThinkingLevel`, `isModernModelRef`, `prepareRuntimeAuth`,
  `resolveUsageAuth`, `fetchUsageSnapshot`, `createEmbeddingProvider`,
  `buildReplayPolicy`,
@@ -723,20 +722,19 @@ The "When to use" column is the quick decision guide.
 | 30  | `buildMissingAuthMessage`         | Replacement for the generic missing-auth recovery message                                                      | Provider needs a provider-specific missing-auth recovery hint                                                                               |
 | 31  | `suppressBuiltInModel`            | Stale upstream model suppression plus optional user-facing error hint                                          | Provider needs to hide stale upstream rows or replace them with a vendor hint                                                               |
 | 32  | `augmentModelCatalog`             | Synthetic/final catalog rows appended after discovery                                                          | Provider needs synthetic forward-compat rows in `models list` and pickers                                                                   |
-| 33  | `isBinaryThinking`                | On/off reasoning toggle for binary-thinking providers                                                          | Provider exposes only binary thinking on/off                                                                                                |
-| 34  | `supportsXHighThinking`           | `xhigh` reasoning support for selected models                                                                  | Provider wants `xhigh` on only a subset of models                                                                                           |
-| 35  | `supportsAdaptiveThinking`        | `adaptive` thinking support for selected models                                                                | Provider wants `adaptive` shown only for models with provider-managed adaptive thinking                                                     |
-| 36  | `supportsMaxThinking`             | `max` reasoning support for selected models                                                                    | Provider wants `max` shown only for models with provider max thinking                                                                       |
-| 37  | `resolveDefaultThinkingLevel`     | Default `/think` level for a specific model family                                                             | Provider owns default `/think` policy for a model family                                                                                    |
-| 38  | `isModernModelRef`                | Modern-model matcher for live profile filters and smoke selection                                              | Provider owns live/smoke preferred-model matching                                                                                           |
-| 39  | `prepareRuntimeAuth`              | Exchange a configured credential into the actual runtime token/key just before inference                       | Provider needs a token exchange or short-lived request credential                                                                           |
-| 40  | `resolveUsageAuth`                | Resolve usage/billing credentials for `/usage` and related status surfaces                                     | Provider needs custom usage/quota token parsing or a different usage credential                                                             |
-| 41  | `fetchUsageSnapshot`              | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved                             | Provider needs a provider-specific usage endpoint or payload parser                                                                         |
-| 42  | `createEmbeddingProvider`         | Build a provider-owned embedding adapter for memory/search                                                     | Memory embedding behavior belongs with the provider plugin                                                                                  |
-| 43  | `buildReplayPolicy`               | Return a replay policy controlling transcript handling for the provider                                        | Provider needs custom transcript policy (for example, thinking-block stripping)                                                             |
-| 44  | `sanitizeReplayHistory`           | Rewrite replay history after generic transcript cleanup                                                        | Provider needs provider-specific replay rewrites beyond shared compaction helpers                                                           |
-| 45  | `validateReplayTurns`             | Final replay-turn validation or reshaping before the embedded runner                                           | Provider transport needs stricter turn validation after generic sanitation                                                                  |
-| 46  | `onModelSelected`                 | Run provider-owned post-selection side effects                                                                 | Provider needs telemetry or provider-owned state when a model becomes active                                                                |
+| 33  | `resolveThinkingProfile`          | Model-specific `/think` level set, display labels, and default                                                 | Provider exposes a custom thinking ladder or binary label for selected models                                                               |
+| 34  | `isBinaryThinking`                | On/off reasoning toggle compatibility hook                                                                     | Provider exposes only binary thinking on/off                                                                                                |
+| 35  | `supportsXHighThinking`           | `xhigh` reasoning support compatibility hook                                                                   | Provider wants `xhigh` on only a subset of models                                                                                           |
+| 36  | `resolveDefaultThinkingLevel`     | Default `/think` level compatibility hook                                                                      | Provider owns default `/think` policy for a model family                                                                                    |
+| 37  | `isModernModelRef`                | Modern-model matcher for live profile filters and smoke selection                                              | Provider owns live/smoke preferred-model matching                                                                                           |
+| 38  | `prepareRuntimeAuth`              | Exchange a configured credential into the actual runtime token/key just before inference                       | Provider needs a token exchange or short-lived request credential                                                                           |
+| 39  | `resolveUsageAuth`                | Resolve usage/billing credentials for `/usage` and related status surfaces                                     | Provider needs custom usage/quota token parsing or a different usage credential                                                             |
+| 40  | `fetchUsageSnapshot`              | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved                             | Provider needs a provider-specific usage endpoint or payload parser                                                                         |
+| 41  | `createEmbeddingProvider`         | Build a provider-owned embedding adapter for memory/search                                                     | Memory embedding behavior belongs with the provider plugin                                                                                  |
+| 42  | `buildReplayPolicy`               | Return a replay policy controlling transcript handling for the provider                                        | Provider needs custom transcript policy (for example, thinking-block stripping)                                                             |
+| 43  | `sanitizeReplayHistory`           | Rewrite replay history after generic transcript cleanup                                                        | Provider needs provider-specific replay rewrites beyond shared compaction helpers                                                           |
+| 44  | `validateReplayTurns`             | Final replay-turn validation or reshaping before the embedded runner                                           | Provider transport needs stricter turn validation after generic sanitation                                                                  |
+| 45  | `onModelSelected`                 | Run provider-owned post-selection side effects                                                                 | Provider needs telemetry or provider-owned state when a model becomes active                                                                |

 `normalizeModelId`, `normalizeTransport`, and `normalizeConfig` first check the
 matched provider plugin, then fall through other hook-capable provider plugins
@@ -808,7 +806,7 @@ api.registerProvider({

 - Anthropic uses `resolveDynamicModel`, `capabilities`, `buildAuthDoctorHint`,
  `resolveUsageAuth`, `fetchUsageSnapshot`, `isCacheTtlEligible`,
-  `supportsAdaptiveThinking`, `supportsMaxThinking`, `resolveDefaultThinkingLevel`, `applyConfigDefaults`, `isModernModelRef`,
+  `resolveThinkingProfile`, `applyConfigDefaults`, `isModernModelRef`,
  and `wrapStreamFn` because it owns Claude 4.6 forward-compat,
  provider-family hints, auth repair guidance, usage endpoint integration,
  prompt-cache eligibility, auth-aware config defaults, Claude
@@ -822,7 +820,7 @@ api.registerProvider({
  provider's beta-header rules.
 - OpenAI uses `resolveDynamicModel`, `normalizeResolvedModel`, and
  `capabilities` plus `buildMissingAuthMessage`, `suppressBuiltInModel`,
-  `augmentModelCatalog`, `supportsXHighThinking`, and `isModernModelRef`
+  `augmentModelCatalog`, `resolveThinkingProfile`, and `isModernModelRef`
  because it owns GPT-5.4 forward-compat, the direct OpenAI
  `openai-completions` -> `openai-responses` normalization, Codex-aware auth
  hints, Spark suppression, synthetic OpenAI list rows, and GPT-5 thinking /
@@ -864,7 +862,7 @@ api.registerProvider({
  `anthropic-by-model` replay family so Claude-specific replay cleanup stays
  scoped to Claude ids instead of every `anthropic-messages` transport.
 - Amazon Bedrock uses `buildReplayPolicy`, `matchesContextOverflowError`,
-  `classifyFailoverReason`, and `resolveDefaultThinkingLevel` because it owns
+  `classifyFailoverReason`, and `resolveThinkingProfile` because it owns
  Bedrock-specific throttle/not-ready/context-overflow error classification
  for Anthropic-on-Bedrock traffic; its replay policy still shares the same
  Claude-only `anthropic-by-model` guard.
@@ -879,7 +877,7 @@ api.registerProvider({
  thinking-block dropping on the Anthropic side while overriding reasoning
  output mode back to native, and the `minimax-fast-mode` stream family owns
  fast-mode model rewrites on the shared stream path.
- Moonshot uses `catalog` plus `wrapStreamFn` because it still uses the shared
+- Moonshot uses `catalog`, `resolveThinkingProfile`, and `wrapStreamFn` because it still uses the shared
  OpenAI transport but needs provider-owned thinking payload normalization; the
  `moonshot-thinking` stream family maps config plus `/think` state onto its
  native binary thinking payload.
@@ -890,7 +888,7 @@ api.registerProvider({
  injection on the shared proxy stream path while skipping `kilo/auto` and
  other proxy model ids that do not support explicit reasoning payloads.
 - Z.AI uses `resolveDynamicModel`, `prepareExtraParams`, `wrapStreamFn`,
-  `isCacheTtlEligible`, `isBinaryThinking`, `isModernModelRef`,
+  `isCacheTtlEligible`, `resolveThinkingProfile`, `isModernModelRef`,
  `resolveUsageAuth`, and `fetchUsageSnapshot` because it owns GLM-5 fallback,
  `tool_stream` defaults, binary thinking UX, modern-model matching, and both
  usage auth + quota fetching; the `tool-stream-default-on` stream family keeps
--- a/docs/plugins/sdk-provider-plugins.md
+++ b/docs/plugins/sdk-provider-plugins.md
@@ -533,20 +533,19 @@ API key auth, and dynamic model resolution.
      | 29 | `buildMissingAuthMessage` | Custom missing-auth hint |
      | 30 | `suppressBuiltInModel` | Hide stale upstream rows |
      | 31 | `augmentModelCatalog` | Synthetic forward-compat rows |
-      | 32 | `isBinaryThinking` | Binary thinking on/off |
-      | 33 | `supportsXHighThinking` | `xhigh` reasoning support |
-      | 34 | `supportsAdaptiveThinking` | Adaptive thinking support |
-      | 35 | `supportsMaxThinking` | `max` reasoning support |
-      | 36 | `resolveDefaultThinkingLevel` | Default `/think` policy |
-      | 37 | `isModernModelRef` | Live/smoke model matching |
-      | 38 | `prepareRuntimeAuth` | Token exchange before inference |
-      | 39 | `resolveUsageAuth` | Custom usage credential parsing |
-      | 40 | `fetchUsageSnapshot` | Custom usage endpoint |
-      | 41 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
-      | 42 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
-      | 43 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
-      | 44 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
-      | 45 | `onModelSelected` | Post-selection callback (e.g. telemetry) |
+      | 32 | `resolveThinkingProfile` | Model-specific `/think` option set |
+      | 33 | `isBinaryThinking` | Binary thinking on/off compatibility |
+      | 34 | `supportsXHighThinking` | `xhigh` reasoning support compatibility |
+      | 35 | `resolveDefaultThinkingLevel` | Default `/think` policy compatibility |
+      | 36 | `isModernModelRef` | Live/smoke model matching |
+      | 37 | `prepareRuntimeAuth` | Token exchange before inference |
+      | 38 | `resolveUsageAuth` | Custom usage credential parsing |
+      | 39 | `fetchUsageSnapshot` | Custom usage endpoint |
+      | 40 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
+      | 41 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
+      | 42 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
+      | 43 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
+      | 44 | `onModelSelected` | Post-selection callback (e.g. telemetry) |

      Prompt tuning note:

--- a/docs/tools/agent-send.md
+++ b/docs/tools/agent-send.md
@@ -65,7 +65,7 @@ programmatic delivery.
 | `--reply-to \<target\>`       | Delivery target override                                    |
 | `--reply-channel \<name\>`    | Delivery channel override                                   |
 | `--reply-account \<id\>`      | Delivery account id override                                |
-| `--thinking \<level\>`        | Set thinking level (off, minimal, low, medium, high, xhigh) |
+| `--thinking \<level\>`        | Set thinking level for the selected model profile           |
 | `--verbose \<on\|full\|off\>` | Set verbose level                                           |
 | `--timeout \<seconds\>`       | Override agent timeout                                      |
 | `--json`                      | Output structured JSON                                      |
--- a/docs/tools/slash-commands.md
+++ b/docs/tools/slash-commands.md
@@ -93,7 +93,7 @@ Built-in commands available today:
 - `/compact [instructions]` compacts the session context. See [/concepts/compaction](/concepts/compaction).
 - `/stop` aborts the current run.
 - `/session idle <duration|off>` and `/session max-age <duration|off>` manage thread-binding expiry.
- `/think <off|minimal|low|medium|high|xhigh>` sets the thinking level. Aliases: `/thinking`, `/t`.
+- `/think <level>` sets the thinking level. Options come from the active model's provider profile; common levels are `off`, `minimal`, `low`, `medium`, and `high`, with custom levels such as `xhigh`, `adaptive`, `max`, or binary `on` only where supported. Aliases: `/thinking`, `/t`.
 - `/verbose on|off|full` toggles verbose output. Alias: `/v`.
 - `/trace on|off` toggles plugin trace output for the current session.
 - `/fast [status|on|off]` shows or sets fast mode.
--- a/docs/tools/thinking.md
+++ b/docs/tools/thinking.md
@@ -21,8 +21,9 @@ title: "Thinking Levels"
  - `x-high`, `x_high`, `extra-high`, `extra high`, and `extra_high` map to `xhigh`.
  - `highest` maps to `high`.
 - Provider notes:
-  - `adaptive` is only advertised in native command menus and pickers for providers/models that declare adaptive thinking support. It remains accepted as a typed directive for compatibility with existing configs and aliases.
-  - `max` is only advertised in native command menus and pickers for providers/models that declare max thinking support. Existing stored `max` settings are remapped to the largest supported level for the selected model when the model does not support `max`.
+  - Thinking menus and pickers are provider-profile driven. Provider plugins declare the exact level set for the selected model, including labels such as binary `on`.
+  - `adaptive`, `xhigh`, and `max` are only advertised for provider/model profiles that support them. Typed directives for unsupported levels are rejected with that model's valid options.
+  - Existing stored unsupported levels, including old `max` values after switching models, are remapped to the largest supported level for the selected model.
  - Anthropic Claude 4.6 models default to `adaptive` when no explicit thinking level is set.
  - Anthropic Claude Opus 4.7 does not default to adaptive thinking. Its API effort default remains provider-owned unless you explicitly set a thinking level.
  - Anthropic Claude Opus 4.7 maps `/think xhigh` to adaptive thinking plus `output_config.effort: "xhigh"`, because `/think` is a thinking directive and `xhigh` is the Opus 4.7 effort setting.
@@ -38,7 +39,7 @@ title: "Thinking Levels"
 2. Session override (set by sending a directive-only message).
 3. Per-agent default (`agents.list[].thinkingDefault` in config).
 4. Global default (`agents.defaults.thinkingDefault` in config).
-5. Fallback: `adaptive` for Anthropic Claude 4.6 models, `off` for Anthropic Claude Opus 4.7 unless explicitly configured, `low` for other reasoning-capable models, `off` otherwise.
+5. Fallback: provider-declared default when available, `low` for other catalog models marked reasoning-capable, `off` otherwise.

 ## Setting a session default

@@ -111,10 +112,13 @@ title: "Thinking Levels"

 - The web chat thinking selector mirrors the session's stored level from the inbound session store/config when the page loads.
 - Picking another level writes the session override immediately via `sessions.patch`; it does not wait for the next send and it is not a one-shot `thinkingOnce` override.
- The first option is always `Default (<resolved level>)`, where the resolved default comes from the active session model: `adaptive` for Claude 4.6 on Anthropic, `off` for Anthropic Claude Opus 4.7 unless configured, `low` for other reasoning-capable models, `off` otherwise.
- The picker stays provider-aware:
-  - most providers show `off | minimal | low | medium | high`
-  - Anthropic/Bedrock Claude 4.6 shows `off | minimal | low | medium | high | adaptive`
-  - Anthropic Claude Opus 4.7 shows `off | minimal | low | medium | high | xhigh | adaptive | max`
-  - Z.AI shows binary `off | on`
+- The first option is always `Default (<resolved level>)`, where the resolved default comes from the active session model's provider thinking profile.
+- The picker uses `thinkingOptions` returned by the gateway session row. The browser UI does not keep its own provider regex list; plugins own model-specific level sets.
 - `/think:<level>` still works and updates the same stored session level, so chat directives and the picker stay in sync.
+
+## Provider profiles
+
+- Provider plugins can expose `resolveThinkingProfile(ctx)` to define the model's supported levels and default.
+- Each profile level has a stored canonical `id` (`off`, `minimal`, `low`, `medium`, `high`, `xhigh`, `adaptive`, or `max`) and may include a display `label`. Binary providers use `{ id: "low", label: "on" }`.
+- Published legacy hooks (`supportsXHighThinking`, `isBinaryThinking`, and `resolveDefaultThinkingLevel`) remain as compatibility adapters, but new custom level sets should use `resolveThinkingProfile`.
+- Gateway rows expose `thinkingOptions` and `thinkingDefault` so ACP/chat clients render the same profile that runtime validation uses.