Carry over #82973 and fix#81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support.
The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior.
Fixes#81281.
Supersedes #82973.
Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
Fix runtime context placement so hidden runtime context is model-visible before the active user turn without persisting as a visible/session message.
Verification:
- git diff --check origin/main...origin/pr/86995-merge
- gh pr checks 86995 --repo openclaw/openclaw --watch=false
- gh run rerun 26493979156 --repo openclaw/openclaw --failed
- gh run watch 26493979156 --repo openclaw/openclaw --exit-status
- CodeQL run 26493979156 attempt 2, Security High (mcp-process-tool-boundary) job 78066719467 passed
Preserve replayability for direct Anthropic sessions whose stored assistant thinking blocks have empty or blank signatures after a newer user turn. Older invalid thinking-only assistant turns are replaced with the existing omitted-reasoning placeholder so the turn shape survives provider replay.
Also keep active tool-use continuations safe: when an assistant tool call is followed by tool results, preserve the latest assistant thinking block so signed-thinking providers can replay the current tool turn unchanged.
Proof:
- node scripts/run-vitest.mjs src/agents/pi-embedded-runner.sanitize-session-history.test.ts src/agents/pi-embedded-runner/thinking.test.ts test/scripts/openclaw-e2e-instance.test.ts
- pnpm check:changed via Blacksmith Testbox through Crabbox, tbx_01ksmfypqet50et92vdm5mmv5v, run https://github.com/openclaw/openclaw/actions/runs/26505947008
- Live Anthropic Messages replay accepted the OpenClaw-sanitized active tool-turn history with a real thinking signature.
- PR CI on 37c2e72d82 completed successfully for relevant checks.
Fixes#86886.
Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Forward cache-read token counts through the OpenAI-compatible chat-completions usage shape as prompt_tokens_details.cached_tokens so clients can price cached turns correctly.
Align internal gateway usage typing with the expanded wire shape.
Thanks @caz0075.