fix: centralize provider thinking profiles

2026-05-07 03:30:45 +00:00 · 2026-04-21 09:04:37 +01:00
parent 1cc2fc82ca
commit f1805ab54d
57 changed files with 718 additions and 572 deletions
--- a/docs/plugins/architecture.md
+++ b/docs/plugins/architecture.md
@@ -658,8 +658,7 @@ Provider plugins now have two layers:
  `buildAuthDoctorHint`, `matchesContextOverflowError`,
  `classifyFailoverReason`, `isCacheTtlEligible`,
  `buildMissingAuthMessage`, `suppressBuiltInModel`, `augmentModelCatalog`,
-  `isBinaryThinking`, `supportsXHighThinking`, `supportsAdaptiveThinking`,
-  `supportsMaxThinking`,
+  `resolveThinkingProfile`, `isBinaryThinking`, `supportsXHighThinking`,
  `resolveDefaultThinkingLevel`, `isModernModelRef`, `prepareRuntimeAuth`,
  `resolveUsageAuth`, `fetchUsageSnapshot`, `createEmbeddingProvider`,
  `buildReplayPolicy`,
@@ -723,20 +722,19 @@ The "When to use" column is the quick decision guide.
 | 30  | `buildMissingAuthMessage`         | Replacement for the generic missing-auth recovery message                                                      | Provider needs a provider-specific missing-auth recovery hint                                                                               |
 | 31  | `suppressBuiltInModel`            | Stale upstream model suppression plus optional user-facing error hint                                          | Provider needs to hide stale upstream rows or replace them with a vendor hint                                                               |
 | 32  | `augmentModelCatalog`             | Synthetic/final catalog rows appended after discovery                                                          | Provider needs synthetic forward-compat rows in `models list` and pickers                                                                   |
-| 33  | `isBinaryThinking`                | On/off reasoning toggle for binary-thinking providers                                                          | Provider exposes only binary thinking on/off                                                                                                |
-| 34  | `supportsXHighThinking`           | `xhigh` reasoning support for selected models                                                                  | Provider wants `xhigh` on only a subset of models                                                                                           |
-| 35  | `supportsAdaptiveThinking`        | `adaptive` thinking support for selected models                                                                | Provider wants `adaptive` shown only for models with provider-managed adaptive thinking                                                     |
-| 36  | `supportsMaxThinking`             | `max` reasoning support for selected models                                                                    | Provider wants `max` shown only for models with provider max thinking                                                                       |
-| 37  | `resolveDefaultThinkingLevel`     | Default `/think` level for a specific model family                                                             | Provider owns default `/think` policy for a model family                                                                                    |
-| 38  | `isModernModelRef`                | Modern-model matcher for live profile filters and smoke selection                                              | Provider owns live/smoke preferred-model matching                                                                                           |
-| 39  | `prepareRuntimeAuth`              | Exchange a configured credential into the actual runtime token/key just before inference                       | Provider needs a token exchange or short-lived request credential                                                                           |
-| 40  | `resolveUsageAuth`                | Resolve usage/billing credentials for `/usage` and related status surfaces                                     | Provider needs custom usage/quota token parsing or a different usage credential                                                             |
-| 41  | `fetchUsageSnapshot`              | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved                             | Provider needs a provider-specific usage endpoint or payload parser                                                                         |
-| 42  | `createEmbeddingProvider`         | Build a provider-owned embedding adapter for memory/search                                                     | Memory embedding behavior belongs with the provider plugin                                                                                  |
-| 43  | `buildReplayPolicy`               | Return a replay policy controlling transcript handling for the provider                                        | Provider needs custom transcript policy (for example, thinking-block stripping)                                                             |
-| 44  | `sanitizeReplayHistory`           | Rewrite replay history after generic transcript cleanup                                                        | Provider needs provider-specific replay rewrites beyond shared compaction helpers                                                           |
-| 45  | `validateReplayTurns`             | Final replay-turn validation or reshaping before the embedded runner                                           | Provider transport needs stricter turn validation after generic sanitation                                                                  |
-| 46  | `onModelSelected`                 | Run provider-owned post-selection side effects                                                                 | Provider needs telemetry or provider-owned state when a model becomes active                                                                |
+| 33  | `resolveThinkingProfile`          | Model-specific `/think` level set, display labels, and default                                                 | Provider exposes a custom thinking ladder or binary label for selected models                                                               |
+| 34  | `isBinaryThinking`                | On/off reasoning toggle compatibility hook                                                                     | Provider exposes only binary thinking on/off                                                                                                |
+| 35  | `supportsXHighThinking`           | `xhigh` reasoning support compatibility hook                                                                   | Provider wants `xhigh` on only a subset of models                                                                                           |
+| 36  | `resolveDefaultThinkingLevel`     | Default `/think` level compatibility hook                                                                      | Provider owns default `/think` policy for a model family                                                                                    |
+| 37  | `isModernModelRef`                | Modern-model matcher for live profile filters and smoke selection                                              | Provider owns live/smoke preferred-model matching                                                                                           |
+| 38  | `prepareRuntimeAuth`              | Exchange a configured credential into the actual runtime token/key just before inference                       | Provider needs a token exchange or short-lived request credential                                                                           |
+| 39  | `resolveUsageAuth`                | Resolve usage/billing credentials for `/usage` and related status surfaces                                     | Provider needs custom usage/quota token parsing or a different usage credential                                                             |
+| 40  | `fetchUsageSnapshot`              | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved                             | Provider needs a provider-specific usage endpoint or payload parser                                                                         |
+| 41  | `createEmbeddingProvider`         | Build a provider-owned embedding adapter for memory/search                                                     | Memory embedding behavior belongs with the provider plugin                                                                                  |
+| 42  | `buildReplayPolicy`               | Return a replay policy controlling transcript handling for the provider                                        | Provider needs custom transcript policy (for example, thinking-block stripping)                                                             |
+| 43  | `sanitizeReplayHistory`           | Rewrite replay history after generic transcript cleanup                                                        | Provider needs provider-specific replay rewrites beyond shared compaction helpers                                                           |
+| 44  | `validateReplayTurns`             | Final replay-turn validation or reshaping before the embedded runner                                           | Provider transport needs stricter turn validation after generic sanitation                                                                  |
+| 45  | `onModelSelected`                 | Run provider-owned post-selection side effects                                                                 | Provider needs telemetry or provider-owned state when a model becomes active                                                                |

 `normalizeModelId`, `normalizeTransport`, and `normalizeConfig` first check the
 matched provider plugin, then fall through other hook-capable provider plugins
@@ -808,7 +806,7 @@ api.registerProvider({

 - Anthropic uses `resolveDynamicModel`, `capabilities`, `buildAuthDoctorHint`,
  `resolveUsageAuth`, `fetchUsageSnapshot`, `isCacheTtlEligible`,
-  `supportsAdaptiveThinking`, `supportsMaxThinking`, `resolveDefaultThinkingLevel`, `applyConfigDefaults`, `isModernModelRef`,
+  `resolveThinkingProfile`, `applyConfigDefaults`, `isModernModelRef`,
  and `wrapStreamFn` because it owns Claude 4.6 forward-compat,
  provider-family hints, auth repair guidance, usage endpoint integration,
  prompt-cache eligibility, auth-aware config defaults, Claude
@@ -822,7 +820,7 @@ api.registerProvider({
  provider's beta-header rules.
 - OpenAI uses `resolveDynamicModel`, `normalizeResolvedModel`, and
  `capabilities` plus `buildMissingAuthMessage`, `suppressBuiltInModel`,
-  `augmentModelCatalog`, `supportsXHighThinking`, and `isModernModelRef`
+  `augmentModelCatalog`, `resolveThinkingProfile`, and `isModernModelRef`
  because it owns GPT-5.4 forward-compat, the direct OpenAI
  `openai-completions` -> `openai-responses` normalization, Codex-aware auth
  hints, Spark suppression, synthetic OpenAI list rows, and GPT-5 thinking /
@@ -864,7 +862,7 @@ api.registerProvider({
  `anthropic-by-model` replay family so Claude-specific replay cleanup stays
  scoped to Claude ids instead of every `anthropic-messages` transport.
 - Amazon Bedrock uses `buildReplayPolicy`, `matchesContextOverflowError`,
-  `classifyFailoverReason`, and `resolveDefaultThinkingLevel` because it owns
+  `classifyFailoverReason`, and `resolveThinkingProfile` because it owns
  Bedrock-specific throttle/not-ready/context-overflow error classification
  for Anthropic-on-Bedrock traffic; its replay policy still shares the same
  Claude-only `anthropic-by-model` guard.
@@ -879,7 +877,7 @@ api.registerProvider({
  thinking-block dropping on the Anthropic side while overriding reasoning
  output mode back to native, and the `minimax-fast-mode` stream family owns
  fast-mode model rewrites on the shared stream path.
- Moonshot uses `catalog` plus `wrapStreamFn` because it still uses the shared
+- Moonshot uses `catalog`, `resolveThinkingProfile`, and `wrapStreamFn` because it still uses the shared
  OpenAI transport but needs provider-owned thinking payload normalization; the
  `moonshot-thinking` stream family maps config plus `/think` state onto its
  native binary thinking payload.
@@ -890,7 +888,7 @@ api.registerProvider({
  injection on the shared proxy stream path while skipping `kilo/auto` and
  other proxy model ids that do not support explicit reasoning payloads.
 - Z.AI uses `resolveDynamicModel`, `prepareExtraParams`, `wrapStreamFn`,
-  `isCacheTtlEligible`, `isBinaryThinking`, `isModernModelRef`,
+  `isCacheTtlEligible`, `resolveThinkingProfile`, `isModernModelRef`,
  `resolveUsageAuth`, and `fetchUsageSnapshot` because it owns GLM-5 fallback,
  `tool_stream` defaults, binary thinking UX, modern-model matching, and both
  usage auth + quota fetching; the `tool-stream-default-on` stream family keeps
--- a/docs/plugins/sdk-provider-plugins.md
+++ b/docs/plugins/sdk-provider-plugins.md
@@ -533,20 +533,19 @@ API key auth, and dynamic model resolution.
      | 29 | `buildMissingAuthMessage` | Custom missing-auth hint |
      | 30 | `suppressBuiltInModel` | Hide stale upstream rows |
      | 31 | `augmentModelCatalog` | Synthetic forward-compat rows |
-      | 32 | `isBinaryThinking` | Binary thinking on/off |
-      | 33 | `supportsXHighThinking` | `xhigh` reasoning support |
-      | 34 | `supportsAdaptiveThinking` | Adaptive thinking support |
-      | 35 | `supportsMaxThinking` | `max` reasoning support |
-      | 36 | `resolveDefaultThinkingLevel` | Default `/think` policy |
-      | 37 | `isModernModelRef` | Live/smoke model matching |
-      | 38 | `prepareRuntimeAuth` | Token exchange before inference |
-      | 39 | `resolveUsageAuth` | Custom usage credential parsing |
-      | 40 | `fetchUsageSnapshot` | Custom usage endpoint |
-      | 41 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
-      | 42 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
-      | 43 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
-      | 44 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
-      | 45 | `onModelSelected` | Post-selection callback (e.g. telemetry) |
+      | 32 | `resolveThinkingProfile` | Model-specific `/think` option set |
+      | 33 | `isBinaryThinking` | Binary thinking on/off compatibility |
+      | 34 | `supportsXHighThinking` | `xhigh` reasoning support compatibility |
+      | 35 | `resolveDefaultThinkingLevel` | Default `/think` policy compatibility |
+      | 36 | `isModernModelRef` | Live/smoke model matching |
+      | 37 | `prepareRuntimeAuth` | Token exchange before inference |
+      | 38 | `resolveUsageAuth` | Custom usage credential parsing |
+      | 39 | `fetchUsageSnapshot` | Custom usage endpoint |
+      | 40 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
+      | 41 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
+      | 42 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
+      | 43 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
+      | 44 | `onModelSelected` | Post-selection callback (e.g. telemetry) |

      Prompt tuning note: