* refactor: remove stale file-backed shims * fix: harden sqlite state ci boundaries * refactor: store matrix idb snapshots in sqlite * fix: satisfy rebased CI guardrails * refactor: store current conversation bindings in sqlite table * refactor: store tui last sessions in sqlite table * refactor: reset sqlite schema history * refactor: drop unshipped sqlite table migration * refactor: remove plugin index file rollback * refactor: drop unshipped sqlite sidecar migrations * refactor: remove runtime commitments kv migration * refactor: preserve kysely sync result types * refactor: drop unshipped sqlite schema migration table * test: keep session usage coverage sqlite-backed * refactor: keep sqlite migration doctor-only * refactor: isolate device legacy imports * refactor: isolate push voicewake legacy imports * refactor: isolate remaining runtime legacy imports * refactor: tighten sqlite migration guardrails * test: cover sqlite persisted enum parsing * refactor: isolate legacy update and tui imports * refactor: tighten sqlite state ownership * refactor: move legacy imports behind doctor * refactor: remove legacy session row lookup * refactor: canonicalize memory transcript locators * refactor: drop transcript path scope fallbacks * refactor: drop runtime legacy session delivery pruning * refactor: store tts prefs only in sqlite * refactor: remove cron store path runtime * refactor: use cron sqlite store keys * refactor: rename telegram message cache scope * refactor: read memory dreaming status from sqlite * refactor: rename cron status store key * refactor: stop remembering transcript file paths * test: use sqlite locators in agent fixtures * refactor: remove file-shaped commitments and cron store surfaces * refactor: keep compaction transcript handles out of session rows * refactor: derive transcript handles from session identity * refactor: derive runtime transcript handles * refactor: remove gateway session locator reads * refactor: remove transcript locator from session rows * refactor: store raw stream diagnostics in sqlite * refactor: remove file-shaped transcript rotation * refactor: hide legacy trajectory paths from runtime * refactor: remove runtime transcript file bridges * refactor: repair database-first rebase fallout * refactor: align tests with database-first state * refactor: remove transcript file handoffs * refactor: sync post-compaction memory by transcript scope * refactor: run codex app-server sessions by id * refactor: bind codex runtime state by session id * refactor: pass memory transcripts by sqlite scope * refactor: remove transcript locator cleanup leftovers * test: remove stale transcript file fixtures * refactor: remove transcript locator test helper * test: make cron sqlite keys explicit * test: remove cron runtime store paths * test: remove stale session file fixtures * test: use sqlite cron keys in diagnostics * refactor: remove runtime delivery queue backfill * test: drop fake export session file mocks * refactor: rename acp session read failure flag * refactor: rename acp row session key * refactor: remove session store test seams * refactor: move legacy session parser tests to doctor * refactor: reindex managed memory in place * refactor: drop stale session store wording * refactor: rename session row helpers * refactor: rename sqlite session entry modules * refactor: remove transcript locator leftovers * refactor: trim file-era audit wording * refactor: clean managed media through sqlite * fix: prefer explicit agent for exports * fix: use prepared agent for session resets * fix: canonicalize legacy codex binding import * test: rename state cleanup helper * docs: align backup docs with sqlite state * refactor: drop legacy Pi usage auth fallback * refactor: move legacy auth profile imports to doctor * refactor: keep Pi model discovery auth in memory * refactor: remove MSTeams legacy learning key fallback * refactor: store model catalog config in sqlite * refactor: use sqlite model catalog at runtime * refactor: remove model json compatibility aliases * refactor: store auth profiles in sqlite * refactor: seed copied auth profiles in sqlite * refactor: make auth profile runtime sqlite-addressed * refactor: migrate hermes secrets into sqlite auth store * refactor: move plugin install config migration to doctor * refactor: rename plugin index audit checks * test: drop auth file assumptions * test: remove legacy transcript file assertions * refactor: drop legacy cli session aliases * refactor: store skill uploads in sqlite * refactor: keep subagent attachments in sqlite vfs * refactor: drop subagent attachment cleanup state * refactor: move legacy session aliases to doctor * refactor: require node 24 for sqlite state runtime * refactor: move provider caches into sqlite state * fix: harden virtual agent filesystem * refactor: enforce database-first runtime state * refactor: rename compaction transcript rotation setting * test: clean sqlite refactor test types * refactor: consolidate sqlite runtime state * refactor: model session conversations in sqlite * refactor: stop deriving cron delivery from session keys * refactor: stop classifying sessions from key shape * refactor: hydrate announce targets from typed delivery * refactor: route heartbeat delivery from typed sqlite context * refactor: tighten typed sqlite session routing * refactor: remove session origin routing shadow * refactor: drop session origin shadow fixtures * perf: query sqlite vfs paths by prefix * refactor: use typed conversation metadata for sessions * refactor: prefer typed session routing metadata * refactor: require typed session routing metadata * refactor: resolve group tool policy from typed sessions * refactor: delete dead session thread info bridge * Show Codex subscription reset times in channel errors (#80456) * feat(plugin-sdk): consolidate session workflow APIs * fix(agents): allow read-only agent mount reads * [codex] refresh plugin regression fixtures * fix(agents): restore compaction gateway logs * test: tighten gateway startup assertions * Redact persisted secret-shaped payloads [AI] (#79006) * test: tighten device pair notify assertions * test: tighten hermes secret assertions * test: assert matrix client error shapes * test: assert config compat warnings * fix(heartbeat): remap cron-run exec events to session keys (#80214) * fix(codex): route btw through native side threads * fix(auth): accept friendly OpenAI order for Codex profiles * fix(codex): rotate auth profiles inside harness * fix: keep browser status page probe within timeout * test: assert agents add outputs * test: pin cron read status * fix(agents): avoid Pi resource discovery stalls Co-authored-by: dataCenter430 <titan032000@gmail.com> * fix: retire timed-out codex app-server clients * test: tighten qa lab runtime assertions * test: check security fix outputs * test: verify extension runtime messages * feat(wake): expose typed sessionKey on wake protocol + system event CLI * fix(gateway): await session_end during shutdown drain and track channel + compaction lifecycle paths (#57790) * test: guard talk consult call helper * fix(codex): scale context engine projection (#80761) * fix(codex): scale context engine projection * fix: document Codex context projection scaling * fix: document Codex context projection scaling * fix: document Codex context projection scaling * fix: document Codex context projection scaling * chore: align Codex projection changelog * chore: realign Codex projection changelog * fix: isolate Codex projection patch --------- Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org> Co-authored-by: Josh Lehman <josh@martian.engineering> * refactor: move agent runtime state toward piless * refactor: remove cron session reaper * refactor: move session management to sqlite * refactor: finish database-first state migration * chore: refresh generated sqlite db types * refactor: remove stale file-backed shims * test: harden kysely type coverage # Conflicts: # .agents/skills/kysely-database-access/SKILL.md # src/infra/kysely-sync.types.test.ts # src/proxy-capture/store.sqlite.test.ts # src/state/openclaw-agent-db.test.ts # src/state/openclaw-state-db.test.ts * refactor: remove cron store path runtime * refactor: keep compaction transcript handles out of session rows * refactor: derive embedded transcripts from sqlite identity * refactor: remove embedded transcript locator handoff * refactor: remove runtime transcript file bridges * refactor: remove transcript file handoffs * refactor: remove MSTeams legacy learning key fallback * refactor: store model catalog config in sqlite * refactor: use sqlite model catalog at runtime # Conflicts: # docs/cli/secrets.md # docs/gateway/authentication.md # docs/gateway/secrets.md * fix: keep oauth sibling sync sqlite-local # Conflicts: # src/commands/onboard-auth.test.ts * refactor: remove task session store maintenance # Conflicts: # src/commands/tasks.ts * refactor: keep diagnostics in state sqlite * refactor: enforce database-first runtime state * refactor: consolidate sqlite runtime state * Show Codex subscription reset times in channel errors (#80456) * fix(codex): refresh subscription limit resets * fix(codex): format reset times for channels * Update CHANGELOG with latest changes and fixes Updated CHANGELOG with recent fixes and improvements. * fix(codex): keep command load failures on codex surface * fix(codex): format account rate limits as rows * fix(codex): summarize account limits as usage status * fix(codex): simplify account limit status * test: tighten subagent announce queue assertion * test: tighten session delete lifecycle assertions * test: tighten cron ops assertions * fix: track cron execution milestones * test: tighten hermes secret assertions * test: assert matrix sync store payloads * test: assert config compat warnings * fix(codex): align btw side thread semantics * fix(codex): honor codex fallback blocking * fix(agents): avoid Pi resource discovery stalls * test: tighten codex event assertions * test: tighten cron assertions * Fix Codex app-server OAuth harness auth * refactor: move agent runtime state toward piless * refactor: move device and push state to sqlite * refactor: move runtime json state imports to doctor * refactor: finish database-first state migration * chore: refresh generated sqlite db types * refactor: clarify cron sqlite store keys * refactor: remove stale file-backed shims * refactor: bind codex runtime state by session id * test: expect sqlite trajectory branch export * refactor: rename session row helpers * fix: keep legacy device identity import in doctor * refactor: enforce database-first runtime state * refactor: consolidate sqlite runtime state * build: align pi contract wrappers * chore: repair database-first rebase * refactor: remove session file test contracts * test: update gateway session expectations * refactor: stop routing from session compatibility shadows * refactor: stop persisting session route shadows * refactor: use typed delivery context in clients * refactor: stop echoing session route shadows * refactor: repair embedded runner rebase imports # Conflicts: # src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts * refactor: align pi contract imports * refactor: satisfy kysely sync helper guard * refactor: remove file transcript bridge remnants * refactor: remove session locator compatibility * refactor: remove session file test contracts * refactor: keep rebase database-first clean * refactor: remove session file assumptions from e2e * docs: clarify database-first goal state * test: remove legacy store markers from sqlite runtime tests * refactor: remove legacy store assumptions from runtime seams * refactor: align sqlite runtime helper seams * test: update memory recall sqlite audit mock * refactor: align database-first runtime type seams * test: clarify doctor cron legacy store names * fix: preserve sqlite session route projections * test: fix copilot token cache test syntax * docs: update database-first proof status * test: align database-first test fixtures * docs: update database-first proof status * refactor: clean extension database-first drift * test: align agent session route proof * test: clarify doctor legacy path fixtures * chore: clean database-first changed checks * chore: repair database-first rebase markers * build: allow baileys git subdependency * chore: repair exp-vfs rebase drift * chore: finish exp-vfs rebase cleanup * chore: satisfy rebase lint drift * chore: fix qqbot rebase type seam * chore: fix rebase drift leftovers * fix: keep auth profile oauth secrets out of sqlite * fix: repair rebase drift tests * test: stabilize pairing request ordering * test: use source manifests in plugin contract checks * fix: restore gateway session metadata after rebase * fix: repair database-first rebase drift * fix: clean up database-first rebase fallout * test: stabilize line quick reply receipt time * fix: repair extension rebase drift * test: keep transcript redaction tests sqlite-backed * fix: carry injected transcript redaction through sqlite * chore: clean database branch rebase residue * fix: repair database branch CI drift * fix: repair database branch CI guard drift * fix: stabilize oauth tls preflight test * test: align database branch fast guards * test: repair build artifact boundary guards * chore: clean changelog rebase markers --------- Co-authored-by: pashpashpash <nik@vault77.ai> Co-authored-by: Eva <eva@100yen.org> Co-authored-by: stainlu <stainlu@newtype-ai.org> Co-authored-by: Jason Zhou <jason.zhou.design@gmail.com> Co-authored-by: Ruben Cuevas <hi@rubencu.com> Co-authored-by: Pavan Kumar Gondhi <pavangondhi@gmail.com> Co-authored-by: Shakker <shakkerdroid@gmail.com> Co-authored-by: Kaspre <36520309+Kaspre@users.noreply.github.com> Co-authored-by: dataCenter430 <titan032000@gmail.com> Co-authored-by: Kaspre <kaspre@gmail.com> Co-authored-by: pandadev66 <nova.full.stack@outlook.com> Co-authored-by: Eva <admin@100yen.org> Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org> Co-authored-by: Josh Lehman <josh@martian.engineering> Co-authored-by: jeffjhunter <support@aipersonamethod.com>
16 KiB
summary, title, read_when
| summary | title | read_when | |||
|---|---|---|---|---|---|
| Prompt caching knobs, merge order, provider behavior, and tuning patterns | Prompt caching |
|
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. OpenClaw normalizes provider usage into cacheRead and cacheWrite where the upstream API exposes those counters directly.
Status surfaces can also recover cache counters from the most recent transcript
usage log when the live session snapshot is missing them, so /status can keep
showing a cache line after partial session metadata loss. Existing nonzero live
cache values still take precedence over transcript fallback values.
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
The sections below cover every cache-related knob that affects prompt reuse and token cost.
Provider references:
- Anthropic prompt caching: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
- OpenAI prompt caching: https://developers.openai.com/api/docs/guides/prompt-caching
- OpenAI API headers and request IDs: https://developers.openai.com/api/reference/overview
- Anthropic request IDs and errors: https://platform.claude.com/docs/en/api/errors
Primary knobs
cacheRetention (global default, model, and per-agent)
Set cache retention as a global default for all models:
agents:
defaults:
params:
cacheRetention: "long" # none | short | long
Override per-model:
agents:
defaults:
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "short" # none | short | long
Per-agent override:
agents:
list:
- id: "alerts"
params:
cacheRetention: "none"
Config merge order:
agents.defaults.params(global default — applies to all models)agents.defaults.models["provider/model"].params(per-model override)agents.list[].params(matching agent id; overrides by key)
contextPruning.mode: "cache-ttl"
Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
agents:
defaults:
contextPruning:
mode: "cache-ttl"
ttl: "1h"
See Session Pruning for full behavior.
Heartbeat keep-warm
Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.
agents:
defaults:
heartbeat:
every: "55m"
Per-agent heartbeat is supported at agents.list[].heartbeat.
Provider behavior
Anthropic (direct API)
cacheRetentionis supported.- With Anthropic API-key auth profiles, OpenClaw seeds
cacheRetention: "short"for Anthropic model refs when unset. - Anthropic native Messages responses expose both
cache_read_input_tokensandcache_creation_input_tokens, so OpenClaw can show bothcacheReadandcacheWrite. - For native Anthropic requests,
cacheRetention: "short"maps to the default 5-minute ephemeral cache, andcacheRetention: "long"upgrades to the 1-hour TTL only on directapi.anthropic.comhosts.
OpenAI (direct API)
- Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
- OpenClaw uses
prompt_cache_keyto keep cache routing stable across turns and usesprompt_cache_retention: "24h"only whencacheRetention: "long"is selected on direct OpenAI hosts. - OpenAI-compatible Completions providers receive
prompt_cache_keyonly when their model config explicitly setscompat.supportsPromptCacheKey: true;cacheRetention: "none"still suppresses it. - OpenAI responses expose cached prompt tokens via
usage.prompt_tokens_details.cached_tokens(orinput_tokens_details.cached_tokenson Responses API events). OpenClaw maps that tocacheRead. - OpenAI does not expose a separate cache-write token counter, so
cacheWritestays0on OpenAI paths even when the provider is warming a cache. - OpenAI returns useful tracing and rate-limit headers such as
x-request-id,openai-processing-ms, andx-ratelimit-*, but cache-hit accounting should come from the usage payload, not from headers. - In practice, OpenAI often behaves like an initial-prefix cache rather than Anthropic-style moving full-history reuse. Stable long-prefix text turns can land near a
4864cached-token plateau in current live probes, while tool-heavy or MCP-style transcripts often plateau near4608cached tokens even on exact repeats.
Anthropic Vertex
- Anthropic models on Vertex AI (
anthropic-vertex/*) supportcacheRetentionthe same way as direct Anthropic. cacheRetention: "long"maps to the real 1-hour prompt-cache TTL on Vertex AI endpoints.- Default cache retention for
anthropic-vertexmatches direct Anthropic defaults. - Vertex requests are routed through boundary-aware cache shaping so cache reuse stays aligned with what providers actually receive.
Amazon Bedrock
- Anthropic Claude model refs (
amazon-bedrock/*anthropic.claude*) support explicitcacheRetentionpass-through. - Non-Anthropic Bedrock models are forced to
cacheRetention: "none"at runtime.
OpenRouter models
For openrouter/anthropic/* model refs, OpenClaw injects Anthropic
cache_control on system/developer prompt blocks to improve prompt-cache
reuse only when the request is still targeting a verified OpenRouter route
(openrouter on its default endpoint, or any provider/base URL that resolves
to openrouter.ai).
For openrouter/deepseek/*, openrouter/moonshot*/*, and openrouter/zai/*
model refs, contextPruning.mode: "cache-ttl" is allowed because OpenRouter
handles provider-side prompt caching automatically. OpenClaw does not inject
Anthropic cache_control markers into those requests.
DeepSeek cache construction is best-effort and can take a few seconds. An
immediate follow-up may still show cached_tokens: 0; verify with a repeated
same-prefix request after a short delay and use usage.prompt_tokens_details.cached_tokens
as the cache-hit signal.
If you repoint the model at an arbitrary OpenAI-compatible proxy URL, OpenClaw stops injecting those OpenRouter-specific Anthropic cache markers.
Other providers
If the provider does not support this cache mode, cacheRetention has no effect.
Google Gemini direct API
- Direct Gemini transport (
api: "google-generative-ai") reports cache hits through upstreamcachedContentTokenCount; OpenClaw maps that tocacheRead. - When
cacheRetentionis set on a direct Gemini model, OpenClaw automatically creates, reuses, and refreshescachedContentsresources for system prompts on Google AI Studio runs. This means you no longer need to pre-create a cached-content handle manually. - You can still pass a pre-existing Gemini cached-content handle through as
params.cachedContent(or legacyparams.cached_content) on the configured model. - This is separate from Anthropic/OpenAI prompt-prefix caching. For Gemini,
OpenClaw manages a provider-native
cachedContentsresource rather than injecting cache markers into the request.
Gemini CLI JSON usage
- Gemini CLI JSON output can also surface cache hits through
stats.cached; OpenClaw maps that tocacheRead. - If the CLI omits a direct
stats.inputvalue, OpenClaw derives input tokens fromstats.input_tokens - stats.cached. - This is usage normalization only. It does not mean OpenClaw is creating Anthropic/OpenAI-style prompt-cache markers for Gemini CLI.
System-prompt cache boundary
OpenClaw splits the system prompt into a stable prefix and a volatile
suffix separated by an internal cache-prefix boundary. Content above the
boundary (tool definitions, skills metadata, workspace files, and other
relatively static context) is ordered so it stays byte-identical across turns.
Content below the boundary (for example HEARTBEAT.md, runtime timestamps, and
other per-turn metadata) is allowed to change without invalidating the cached
prefix.
Key design choices:
- Stable workspace project-context files are ordered before
HEARTBEAT.mdso heartbeat churn does not bust the stable prefix. - The boundary is applied across Anthropic-family, OpenAI-family, Google, and CLI transport shaping so all supported providers benefit from the same prefix stability.
- Codex Responses and Anthropic Vertex requests are routed through boundary-aware cache shaping so cache reuse stays aligned with what providers actually receive.
- System-prompt fingerprints are normalized (whitespace, line endings, hook-added context, runtime capability ordering) so semantically unchanged prompts share KV/cache across turns.
If you see unexpected cacheWrite spikes after a config or workspace change,
check whether the change lands above or below the cache boundary. Moving
volatile content below the boundary (or stabilizing it) often resolves the
issue.
OpenClaw cache-stability guards
OpenClaw also keeps several cache-sensitive payload shapes deterministic before the request reaches the provider:
- Bundle MCP tool catalogs are sorted deterministically before tool
registration, so
listTools()order changes do not churn the tools block and bust prompt-cache prefixes. - Legacy sessions with persisted image blocks keep the 3 most recent completed turns intact; older already-processed image blocks may be replaced with a marker so image-heavy follow-ups do not keep re-sending large stale payloads.
Tuning patterns
Mixed traffic (recommended default)
Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:
agents:
defaults:
model:
primary: "anthropic/claude-opus-4-6"
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "long"
list:
- id: "research"
default: true
heartbeat:
every: "55m"
- id: "alerts"
params:
cacheRetention: "none"
Cost-first baseline
- Set baseline
cacheRetention: "short". - Enable
contextPruning.mode: "cache-ttl". - Keep heartbeat below your TTL only for agents that benefit from warm caches.
Cache diagnostics
OpenClaw exposes dedicated cache-trace diagnostics for embedded agent runs.
For normal user-facing diagnostics, /status and other usage summaries can use
the latest transcript usage entry as a fallback source for cacheRead /
cacheWrite when the live session entry does not have those counters.
Live regression tests
OpenClaw keeps one combined live cache regression gate for repeated prefixes, tool turns, image turns, MCP-style tool transcripts, and an Anthropic no-cache control.
src/agents/live-cache-regression.live.test.tssrc/agents/live-cache-regression-baseline.ts
Run the narrow live gate with:
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_CACHE_TEST=1 pnpm test:live:cache
The baseline file stores the most recent observed live numbers plus the provider-specific regression floors used by the test. The runner also uses fresh per-run session IDs and prompt namespaces so previous cache state does not pollute the current regression sample.
These tests intentionally do not use identical success criteria across providers.
Anthropic live expectations
- Expect explicit warmup writes via
cacheWrite. - Expect near-full history reuse on repeated turns because Anthropic cache control advances the cache breakpoint through the conversation.
- Current live assertions still use high hit-rate thresholds for stable, tool, and image paths.
OpenAI live expectations
- Expect
cacheReadonly.cacheWriteremains0. - Treat repeated-turn cache reuse as a provider-specific plateau, not as Anthropic-style moving full-history reuse.
- Current live assertions use conservative floor checks derived from observed live behavior on
gpt-5.4-mini:- stable prefix:
cacheRead >= 4608, hit rate>= 0.90 - tool transcript:
cacheRead >= 4096, hit rate>= 0.85 - image transcript:
cacheRead >= 3840, hit rate>= 0.82 - MCP-style transcript:
cacheRead >= 4096, hit rate>= 0.85
- stable prefix:
Fresh combined live verification on 2026-04-04 landed at:
- stable prefix:
cacheRead=4864, hit rate0.966 - tool transcript:
cacheRead=4608, hit rate0.896 - image transcript:
cacheRead=4864, hit rate0.954 - MCP-style transcript:
cacheRead=4608, hit rate0.891
Recent local wall-clock time for the combined gate was about 88s.
Why the assertions differ:
- Anthropic exposes explicit cache breakpoints and moving conversation-history reuse.
- OpenAI prompt caching is still exact-prefix sensitive, but the effective reusable prefix in live Responses traffic can plateau earlier than the full prompt.
- Because of that, comparing Anthropic and OpenAI by a single cross-provider percentage threshold creates false regressions.
diagnostics.cacheTrace config
diagnostics:
cacheTrace:
enabled: true
includeMessages: false # default true
includePrompt: false # default true
includeSystem: false # default true
- Cache trace events are stored in the SQLite state database.
includeMessages:trueincludePrompt:trueincludeSystem:true
Env toggles (one-off debugging)
OPENCLAW_CACHE_TRACE=1enables cache tracing.OPENCLAW_CACHE_TRACE_MESSAGES=0|1toggles full message payload capture.OPENCLAW_CACHE_TRACE_PROMPT=0|1toggles prompt text capture.OPENCLAW_CACHE_TRACE_SYSTEM=0|1toggles system prompt capture.
What to inspect
- Cache trace events are stored in SQLite by default and include staged snapshots like
session:loaded,prompt:before,stream:context, andsession:after. - Per-turn cache token impact is visible in normal usage surfaces via
cacheReadandcacheWrite(for example/usage fulland session usage summaries). - For Anthropic, expect both
cacheReadandcacheWritewhen caching is active. - For OpenAI, expect
cacheReadon cache hits andcacheWriteto remain0; OpenAI does not publish a separate cache-write token field. - If you need request tracing, log request IDs and rate-limit headers separately from cache metrics. OpenClaw's current cache-trace output is focused on prompt/session shape and normalized token usage rather than raw provider response headers.
Quick troubleshooting
- High
cacheWriteon most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings. - High
cacheWriteon Anthropic: often means the cache breakpoint is landing on content that changes every request. - Low OpenAI
cacheRead: verify the stable prefix is at the front, the repeated prefix is at least 1024 tokens, and the sameprompt_cache_keyis reused for turns that should share a cache. - No effect from
cacheRetention: confirm model key matchesagents.defaults.models["provider/model"]. - Bedrock Nova/Mistral requests with cache settings: expected runtime force to
none.
Related docs: