fix: forward completions prompt cache keys

2026-05-06 14:50:45 +00:00 · 2026-04-25 03:51:35 +01:00
parent 24fdfdba6e
commit 26f06afb90
4 changed files with 72 additions and 3 deletions
--- a/docs/reference/prompt-caching.md
+++ b/docs/reference/prompt-caching.md
@@ -105,6 +105,7 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`.

 - Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
 - OpenClaw uses `prompt_cache_key` to keep cache routing stable across turns and uses `prompt_cache_retention: "24h"` only when `cacheRetention: "long"` is selected on direct OpenAI hosts.
+- OpenAI-compatible Completions providers receive `prompt_cache_key` only when their model config explicitly sets `compat.supportsPromptCacheKey: true`; `cacheRetention: "none"` still suppresses it.
 - OpenAI responses expose cached prompt tokens via `usage.prompt_tokens_details.cached_tokens` (or `input_tokens_details.cached_tokens` on Responses API events). OpenClaw maps that to `cacheRead`.
 - OpenAI does not expose a separate cache-write token counter, so `cacheWrite` stays `0` on OpenAI paths even when the provider is warming a cache.
 - OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers.