fix: centralize provider thinking profiles

This commit is contained in:
Peter Steinberger
2026-04-21 09:04:37 +01:00
parent 1cc2fc82ca
commit f1805ab54d
57 changed files with 718 additions and 572 deletions

View File

@@ -658,8 +658,7 @@ Provider plugins now have two layers:
`buildAuthDoctorHint`, `matchesContextOverflowError`,
`classifyFailoverReason`, `isCacheTtlEligible`,
`buildMissingAuthMessage`, `suppressBuiltInModel`, `augmentModelCatalog`,
`isBinaryThinking`, `supportsXHighThinking`, `supportsAdaptiveThinking`,
`supportsMaxThinking`,
`resolveThinkingProfile`, `isBinaryThinking`, `supportsXHighThinking`,
`resolveDefaultThinkingLevel`, `isModernModelRef`, `prepareRuntimeAuth`,
`resolveUsageAuth`, `fetchUsageSnapshot`, `createEmbeddingProvider`,
`buildReplayPolicy`,
@@ -723,20 +722,19 @@ The "When to use" column is the quick decision guide.
| 30 | `buildMissingAuthMessage` | Replacement for the generic missing-auth recovery message | Provider needs a provider-specific missing-auth recovery hint |
| 31 | `suppressBuiltInModel` | Stale upstream model suppression plus optional user-facing error hint | Provider needs to hide stale upstream rows or replace them with a vendor hint |
| 32 | `augmentModelCatalog` | Synthetic/final catalog rows appended after discovery | Provider needs synthetic forward-compat rows in `models list` and pickers |
| 33 | `isBinaryThinking` | On/off reasoning toggle for binary-thinking providers | Provider exposes only binary thinking on/off |
| 34 | `supportsXHighThinking` | `xhigh` reasoning support for selected models | Provider wants `xhigh` on only a subset of models |
| 35 | `supportsAdaptiveThinking` | `adaptive` thinking support for selected models | Provider wants `adaptive` shown only for models with provider-managed adaptive thinking |
| 36 | `supportsMaxThinking` | `max` reasoning support for selected models | Provider wants `max` shown only for models with provider max thinking |
| 37 | `resolveDefaultThinkingLevel` | Default `/think` level for a specific model family | Provider owns default `/think` policy for a model family |
| 38 | `isModernModelRef` | Modern-model matcher for live profile filters and smoke selection | Provider owns live/smoke preferred-model matching |
| 39 | `prepareRuntimeAuth` | Exchange a configured credential into the actual runtime token/key just before inference | Provider needs a token exchange or short-lived request credential |
| 40 | `resolveUsageAuth` | Resolve usage/billing credentials for `/usage` and related status surfaces | Provider needs custom usage/quota token parsing or a different usage credential |
| 41 | `fetchUsageSnapshot` | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved | Provider needs a provider-specific usage endpoint or payload parser |
| 42 | `createEmbeddingProvider` | Build a provider-owned embedding adapter for memory/search | Memory embedding behavior belongs with the provider plugin |
| 43 | `buildReplayPolicy` | Return a replay policy controlling transcript handling for the provider | Provider needs custom transcript policy (for example, thinking-block stripping) |
| 44 | `sanitizeReplayHistory` | Rewrite replay history after generic transcript cleanup | Provider needs provider-specific replay rewrites beyond shared compaction helpers |
| 45 | `validateReplayTurns` | Final replay-turn validation or reshaping before the embedded runner | Provider transport needs stricter turn validation after generic sanitation |
| 46 | `onModelSelected` | Run provider-owned post-selection side effects | Provider needs telemetry or provider-owned state when a model becomes active |
| 33 | `resolveThinkingProfile` | Model-specific `/think` level set, display labels, and default | Provider exposes a custom thinking ladder or binary label for selected models |
| 34 | `isBinaryThinking` | On/off reasoning toggle compatibility hook | Provider exposes only binary thinking on/off |
| 35 | `supportsXHighThinking` | `xhigh` reasoning support compatibility hook | Provider wants `xhigh` on only a subset of models |
| 36 | `resolveDefaultThinkingLevel` | Default `/think` level compatibility hook | Provider owns default `/think` policy for a model family |
| 37 | `isModernModelRef` | Modern-model matcher for live profile filters and smoke selection | Provider owns live/smoke preferred-model matching |
| 38 | `prepareRuntimeAuth` | Exchange a configured credential into the actual runtime token/key just before inference | Provider needs a token exchange or short-lived request credential |
| 39 | `resolveUsageAuth` | Resolve usage/billing credentials for `/usage` and related status surfaces | Provider needs custom usage/quota token parsing or a different usage credential |
| 40 | `fetchUsageSnapshot` | Fetch and normalize provider-specific usage/quota snapshots after auth is resolved | Provider needs a provider-specific usage endpoint or payload parser |
| 41 | `createEmbeddingProvider` | Build a provider-owned embedding adapter for memory/search | Memory embedding behavior belongs with the provider plugin |
| 42 | `buildReplayPolicy` | Return a replay policy controlling transcript handling for the provider | Provider needs custom transcript policy (for example, thinking-block stripping) |
| 43 | `sanitizeReplayHistory` | Rewrite replay history after generic transcript cleanup | Provider needs provider-specific replay rewrites beyond shared compaction helpers |
| 44 | `validateReplayTurns` | Final replay-turn validation or reshaping before the embedded runner | Provider transport needs stricter turn validation after generic sanitation |
| 45 | `onModelSelected` | Run provider-owned post-selection side effects | Provider needs telemetry or provider-owned state when a model becomes active |
`normalizeModelId`, `normalizeTransport`, and `normalizeConfig` first check the
matched provider plugin, then fall through other hook-capable provider plugins
@@ -808,7 +806,7 @@ api.registerProvider({
- Anthropic uses `resolveDynamicModel`, `capabilities`, `buildAuthDoctorHint`,
`resolveUsageAuth`, `fetchUsageSnapshot`, `isCacheTtlEligible`,
`supportsAdaptiveThinking`, `supportsMaxThinking`, `resolveDefaultThinkingLevel`, `applyConfigDefaults`, `isModernModelRef`,
`resolveThinkingProfile`, `applyConfigDefaults`, `isModernModelRef`,
and `wrapStreamFn` because it owns Claude 4.6 forward-compat,
provider-family hints, auth repair guidance, usage endpoint integration,
prompt-cache eligibility, auth-aware config defaults, Claude
@@ -822,7 +820,7 @@ api.registerProvider({
provider's beta-header rules.
- OpenAI uses `resolveDynamicModel`, `normalizeResolvedModel`, and
`capabilities` plus `buildMissingAuthMessage`, `suppressBuiltInModel`,
`augmentModelCatalog`, `supportsXHighThinking`, and `isModernModelRef`
`augmentModelCatalog`, `resolveThinkingProfile`, and `isModernModelRef`
because it owns GPT-5.4 forward-compat, the direct OpenAI
`openai-completions` -> `openai-responses` normalization, Codex-aware auth
hints, Spark suppression, synthetic OpenAI list rows, and GPT-5 thinking /
@@ -864,7 +862,7 @@ api.registerProvider({
`anthropic-by-model` replay family so Claude-specific replay cleanup stays
scoped to Claude ids instead of every `anthropic-messages` transport.
- Amazon Bedrock uses `buildReplayPolicy`, `matchesContextOverflowError`,
`classifyFailoverReason`, and `resolveDefaultThinkingLevel` because it owns
`classifyFailoverReason`, and `resolveThinkingProfile` because it owns
Bedrock-specific throttle/not-ready/context-overflow error classification
for Anthropic-on-Bedrock traffic; its replay policy still shares the same
Claude-only `anthropic-by-model` guard.
@@ -879,7 +877,7 @@ api.registerProvider({
thinking-block dropping on the Anthropic side while overriding reasoning
output mode back to native, and the `minimax-fast-mode` stream family owns
fast-mode model rewrites on the shared stream path.
- Moonshot uses `catalog` plus `wrapStreamFn` because it still uses the shared
- Moonshot uses `catalog`, `resolveThinkingProfile`, and `wrapStreamFn` because it still uses the shared
OpenAI transport but needs provider-owned thinking payload normalization; the
`moonshot-thinking` stream family maps config plus `/think` state onto its
native binary thinking payload.
@@ -890,7 +888,7 @@ api.registerProvider({
injection on the shared proxy stream path while skipping `kilo/auto` and
other proxy model ids that do not support explicit reasoning payloads.
- Z.AI uses `resolveDynamicModel`, `prepareExtraParams`, `wrapStreamFn`,
`isCacheTtlEligible`, `isBinaryThinking`, `isModernModelRef`,
`isCacheTtlEligible`, `resolveThinkingProfile`, `isModernModelRef`,
`resolveUsageAuth`, and `fetchUsageSnapshot` because it owns GLM-5 fallback,
`tool_stream` defaults, binary thinking UX, modern-model matching, and both
usage auth + quota fetching; the `tool-stream-default-on` stream family keeps

View File

@@ -533,20 +533,19 @@ API key auth, and dynamic model resolution.
| 29 | `buildMissingAuthMessage` | Custom missing-auth hint |
| 30 | `suppressBuiltInModel` | Hide stale upstream rows |
| 31 | `augmentModelCatalog` | Synthetic forward-compat rows |
| 32 | `isBinaryThinking` | Binary thinking on/off |
| 33 | `supportsXHighThinking` | `xhigh` reasoning support |
| 34 | `supportsAdaptiveThinking` | Adaptive thinking support |
| 35 | `supportsMaxThinking` | `max` reasoning support |
| 36 | `resolveDefaultThinkingLevel` | Default `/think` policy |
| 37 | `isModernModelRef` | Live/smoke model matching |
| 38 | `prepareRuntimeAuth` | Token exchange before inference |
| 39 | `resolveUsageAuth` | Custom usage credential parsing |
| 40 | `fetchUsageSnapshot` | Custom usage endpoint |
| 41 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
| 42 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
| 43 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
| 44 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
| 45 | `onModelSelected` | Post-selection callback (e.g. telemetry) |
| 32 | `resolveThinkingProfile` | Model-specific `/think` option set |
| 33 | `isBinaryThinking` | Binary thinking on/off compatibility |
| 34 | `supportsXHighThinking` | `xhigh` reasoning support compatibility |
| 35 | `resolveDefaultThinkingLevel` | Default `/think` policy compatibility |
| 36 | `isModernModelRef` | Live/smoke model matching |
| 37 | `prepareRuntimeAuth` | Token exchange before inference |
| 38 | `resolveUsageAuth` | Custom usage credential parsing |
| 39 | `fetchUsageSnapshot` | Custom usage endpoint |
| 40 | `createEmbeddingProvider` | Provider-owned embedding adapter for memory/search |
| 41 | `buildReplayPolicy` | Custom transcript replay/compaction policy |
| 42 | `sanitizeReplayHistory` | Provider-specific replay rewrites after generic cleanup |
| 43 | `validateReplayTurns` | Strict replay-turn validation before the embedded runner |
| 44 | `onModelSelected` | Post-selection callback (e.g. telemetry) |
Prompt tuning note: