mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 21:00:44 +00:00
* fix(amazon-bedrock): add known model context windows to discovery Bedrock's ListFoundationModels API does not expose token limits. Discovery was hardcoding contextWindow: 32000 for every model, causing Claude (1M), Nova (300K), and other models to hit premature 'Context limit exceeded' errors and unnecessary session resets. Adds a lookup table of known context windows for Bedrock models: - Anthropic Claude: 200K-1M - Amazon Nova: 128K-1M - Meta Llama: 128K - Mistral: 32K-128K - DeepSeek: 128K - Cohere: 128K - AI21 Jamba: 256K Inference profile prefixes (us., eu., ap., global.) are stripped before lookup, so us.anthropic.claude-opus-4-6-v1 correctly resolves to 1M. Also raises the default fallback from 32K to 128K for unknown models — most modern models have at least 128K context. Single file change, no type system modifications. Complementary to #65030 (provenance flag for warning on unknown models). Fixes #64919 Related: #64250 * add KNOWN_MAX_TOKENS map and expand model coverage - Add KNOWN_MAX_TOKENS lookup table with Bedrock-optimized values that balance response quality against quota burndown (5x rate for Claude 3.7+) - Add missing models to KNOWN_CONTEXT_WINDOWS: Opus 4.7 (1M), Opus 4.1/4.5, Sonnet 4, Claude 3/3.5 Haiku, DeepSeek V3/V3.2, Google Gemma 3 - Refactor prefix-stripping into shared resolveKnownValue() helper - Fix: use !== undefined instead of truthy check for table lookups - Wire resolveKnownMaxTokens into toModelDefinition and resolveInferenceProfiles Quota burndown context: Bedrock reserves input_tokens + max_tokens from TPM at request start. For Claude 3.7+, output burns at 5x. The values in KNOWN_MAX_TOKENS are intentionally conservative (8-16K for Claude) to maximize concurrent throughput while still allowing useful responses. Thinking budget is added separately by the runtime. * remove KNOWN_MAX_TOKENS — maxTokens should be handled upstream Remove the KNOWN_MAX_TOKENS map. Hardcoding maxTokens values in discovery is the wrong layer to solve this — any explicit value still gets reserved against Bedrock's TPM quota at request start. The correct fix is upstream in pi's Bedrock provider: omit maxTokens from inferenceConfig when not explicitly set, letting the model use its internal default. This avoids quota waste entirely. See: badlogic/pi-mono#3399 and badlogic/pi-mono#3400 Keep the expanded KNOWN_CONTEXT_WINDOWS (context windows ARE the right thing to set in discovery — they affect compaction thresholds and session management, not API-level quota reservation). * docs: clarify why hardcoded context windows are needed Bedrock's ListFoundationModels and GetFoundationModel APIs return no token limit information — there is no Bedrock API to discover context windows or max output tokens programmatically. Note that this table should become a fallback if AWS adds token metadata in the future. * fix: add au and apac to inference profile prefix regex Add missing geo prefixes discovered by querying inference profiles across multiple regions: - au. (Australia/NZ, used in ap-southeast-2/4/6) - apac. (Asia-Pacific, used for older models in ap-northeast-1) Both resolveKnownContextWindow and resolveBaseModelId now handle all known prefixes: us, eu, ap, apac, au, jp, global. * test: port au. prefix test from #65449 by @alickgithub2, add apac. coverage Port the Australia/NZ inference profile test from PR #65449 (credit: @alickgithub2) and extend it to also cover the apac. prefix discovered in ap-northeast-1. * expand model coverage: Llama 4, MiniMax, NVIDIA, Mistral 3, GLM, Qwen Cross-referenced KNOWN_CONTEXT_WINDOWS against live list-foundation-models API. Added missing models: - Llama 4 Maverick (1M) and Scout (512K) - MiniMax M2/M2.1/M2.5 (1M) - NVIDIA Nemotron Super/Nano variants (128K) - Mistral Large 3 675B (128K) - GLM 4.7/4.7-flash/5 (128K) - Qwen3 Coder/32B/VL (128-256K) Removed deprecated deepseek.v3-v1:0 and claude-opus-4-20250514 (not in active foundation models list). * raise default context window from 128K to 200K 200K matches the floor for all current Claude models (the most popular on Bedrock). Every other active model with a lower actual limit is already in the explicit table. This ensures new Claude models get a correct default without requiring a table update. * test: update discovery test expectations for known context window values * test: fix remaining contextWindow expectation (default 200K) * fix(amazon-bedrock): keep conservative context fallback * docs(changelog): note Bedrock context window fix * fix(amazon-bedrock): normalize known context fallback --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>