Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths.
Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.
* fix(amazon-bedrock): add known model context windows to discovery
Bedrock's ListFoundationModels API does not expose token limits. Discovery
was hardcoding contextWindow: 32000 for every model, causing Claude (1M),
Nova (300K), and other models to hit premature 'Context limit exceeded'
errors and unnecessary session resets.
Adds a lookup table of known context windows for Bedrock models:
- Anthropic Claude: 200K-1M
- Amazon Nova: 128K-1M
- Meta Llama: 128K
- Mistral: 32K-128K
- DeepSeek: 128K
- Cohere: 128K
- AI21 Jamba: 256K
Inference profile prefixes (us., eu., ap., global.) are stripped before
lookup, so us.anthropic.claude-opus-4-6-v1 correctly resolves to 1M.
Also raises the default fallback from 32K to 128K for unknown models —
most modern models have at least 128K context.
Single file change, no type system modifications.
Complementary to #65030 (provenance flag for warning on unknown models).
Fixes#64919
Related: #64250
* add KNOWN_MAX_TOKENS map and expand model coverage
- Add KNOWN_MAX_TOKENS lookup table with Bedrock-optimized values that
balance response quality against quota burndown (5x rate for Claude 3.7+)
- Add missing models to KNOWN_CONTEXT_WINDOWS: Opus 4.7 (1M), Opus 4.1/4.5,
Sonnet 4, Claude 3/3.5 Haiku, DeepSeek V3/V3.2, Google Gemma 3
- Refactor prefix-stripping into shared resolveKnownValue() helper
- Fix: use !== undefined instead of truthy check for table lookups
- Wire resolveKnownMaxTokens into toModelDefinition and resolveInferenceProfiles
Quota burndown context: Bedrock reserves input_tokens + max_tokens from
TPM at request start. For Claude 3.7+, output burns at 5x. The values
in KNOWN_MAX_TOKENS are intentionally conservative (8-16K for Claude)
to maximize concurrent throughput while still allowing useful responses.
Thinking budget is added separately by the runtime.
* remove KNOWN_MAX_TOKENS — maxTokens should be handled upstream
Remove the KNOWN_MAX_TOKENS map. Hardcoding maxTokens values in
discovery is the wrong layer to solve this — any explicit value
still gets reserved against Bedrock's TPM quota at request start.
The correct fix is upstream in pi's Bedrock provider: omit maxTokens
from inferenceConfig when not explicitly set, letting the model use
its internal default. This avoids quota waste entirely.
See: badlogic/pi-mono#3399 and badlogic/pi-mono#3400
Keep the expanded KNOWN_CONTEXT_WINDOWS (context windows ARE the
right thing to set in discovery — they affect compaction thresholds
and session management, not API-level quota reservation).
* docs: clarify why hardcoded context windows are needed
Bedrock's ListFoundationModels and GetFoundationModel APIs return no
token limit information — there is no Bedrock API to discover context
windows or max output tokens programmatically. Note that this table
should become a fallback if AWS adds token metadata in the future.
* fix: add au and apac to inference profile prefix regex
Add missing geo prefixes discovered by querying inference profiles
across multiple regions:
- au. (Australia/NZ, used in ap-southeast-2/4/6)
- apac. (Asia-Pacific, used for older models in ap-northeast-1)
Both resolveKnownContextWindow and resolveBaseModelId now handle
all known prefixes: us, eu, ap, apac, au, jp, global.
* test: port au. prefix test from #65449 by @alickgithub2, add apac. coverage
Port the Australia/NZ inference profile test from PR #65449
(credit: @alickgithub2) and extend it to also cover the apac.
prefix discovered in ap-northeast-1.
* expand model coverage: Llama 4, MiniMax, NVIDIA, Mistral 3, GLM, Qwen
Cross-referenced KNOWN_CONTEXT_WINDOWS against live
list-foundation-models API. Added missing models:
- Llama 4 Maverick (1M) and Scout (512K)
- MiniMax M2/M2.1/M2.5 (1M)
- NVIDIA Nemotron Super/Nano variants (128K)
- Mistral Large 3 675B (128K)
- GLM 4.7/4.7-flash/5 (128K)
- Qwen3 Coder/32B/VL (128-256K)
Removed deprecated deepseek.v3-v1:0 and claude-opus-4-20250514
(not in active foundation models list).
* raise default context window from 128K to 200K
200K matches the floor for all current Claude models (the most
popular on Bedrock). Every other active model with a lower actual
limit is already in the explicit table. This ensures new Claude
models get a correct default without requiring a table update.
* test: update discovery test expectations for known context window values
* test: fix remaining contextWindow expectation (default 200K)
* fix(amazon-bedrock): keep conservative context fallback
* docs(changelog): note Bedrock context window fix
* fix(amazon-bedrock): normalize known context fallback
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(amazon-bedrock): inject cache points for application inference profile ARNs
pi-ai's internal supportsPromptCaching checks model.id for specific Claude
model name patterns (e.g. "-4-", "claude-3-7-sonnet"), which fails for
application inference profile ARNs that don't contain the model name.
This causes prompt caching to silently break for Bedrock users with
application inference profiles.
Work around this by detecting when pi-ai would miss cache point injection
(via piAiWouldInjectCachePoints mirror) and patching the Converse API
payload via onPayload to add cachePoint blocks to the system prompt and
last user message — matching the same format pi-ai uses natively.
The fix is safe:
- Checks for existing cache points to avoid double-injection
- Respects cacheRetention: "none"
- Defaults to "short" retention (matching pi-ai default)
- Becomes a no-op once upstream pi-mono#2925 is fixed
Fixes#19279
Upstream: https://github.com/badlogic/pi-mono/issues/2925
* fix(amazon-bedrock): tighten app-profile cache injection
---------
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>