mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-05 01:50:24 +00:00
test: add live cache provider probes
This commit is contained in:
@@ -26,7 +26,7 @@ openclaw models scan
|
||||
|
||||
`openclaw models status` shows the resolved default/fallbacks plus an auth overview.
|
||||
When provider usage snapshots are available, the OAuth/token status section includes
|
||||
provider usage headers.
|
||||
provider usage windows and quota snapshots.
|
||||
Add `--probe` to run live auth probes against each configured provider profile.
|
||||
Probes are real requests (may consume tokens and trigger rate limits).
|
||||
Use `--agent <id>` to inspect a configured agent’s model/auth state. When omitted,
|
||||
|
||||
@@ -9,14 +9,18 @@ read_when:
|
||||
|
||||
# Prompt caching
|
||||
|
||||
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (`cacheWrite`), and later matching requests can read them back (`cacheRead`).
|
||||
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. OpenClaw normalizes provider usage into `cacheRead` and `cacheWrite` where the upstream API exposes those counters directly.
|
||||
|
||||
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
|
||||
|
||||
This page covers all cache-related knobs that affect prompt reuse and token cost.
|
||||
|
||||
For Anthropic pricing details, see:
|
||||
[https://docs.anthropic.com/docs/build-with-claude/prompt-caching](https://docs.anthropic.com/docs/build-with-claude/prompt-caching)
|
||||
Provider references:
|
||||
|
||||
- Anthropic prompt caching: [https://platform.claude.com/docs/en/build-with-claude/prompt-caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
|
||||
- OpenAI prompt caching: [https://developers.openai.com/api/docs/guides/prompt-caching](https://developers.openai.com/api/docs/guides/prompt-caching)
|
||||
- OpenAI API headers and request IDs: [https://developers.openai.com/api/reference/overview](https://developers.openai.com/api/reference/overview)
|
||||
- Anthropic request IDs and errors: [https://platform.claude.com/docs/en/api/errors](https://platform.claude.com/docs/en/api/errors)
|
||||
|
||||
## Primary knobs
|
||||
|
||||
@@ -100,6 +104,16 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`.
|
||||
|
||||
- `cacheRetention` is supported.
|
||||
- With Anthropic API-key auth profiles, OpenClaw seeds `cacheRetention: "short"` for Anthropic model refs when unset.
|
||||
- Anthropic native Messages responses expose both `cache_read_input_tokens` and `cache_creation_input_tokens`, so OpenClaw can show both `cacheRead` and `cacheWrite`.
|
||||
- For native Anthropic requests, `cacheRetention: "short"` maps to the default 5-minute ephemeral cache, and `cacheRetention: "long"` upgrades to the 1-hour TTL only on direct `api.anthropic.com` hosts.
|
||||
|
||||
### OpenAI (direct API)
|
||||
|
||||
- Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
|
||||
- OpenClaw uses `prompt_cache_key` to keep cache routing stable across turns and uses `prompt_cache_retention: "24h"` only when `cacheRetention: "long"` is selected on direct OpenAI hosts.
|
||||
- OpenAI responses expose cached prompt tokens via `usage.prompt_tokens_details.cached_tokens` (or `input_tokens_details.cached_tokens` on Responses API events). OpenClaw maps that to `cacheRead`.
|
||||
- OpenAI does not expose a separate cache-write token counter, so `cacheWrite` stays `0` on OpenAI paths even when the provider is warming a cache.
|
||||
- OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers.
|
||||
|
||||
### Amazon Bedrock
|
||||
|
||||
@@ -180,10 +194,15 @@ Defaults:
|
||||
|
||||
- Cache trace events are JSONL and include staged snapshots like `session:loaded`, `prompt:before`, `stream:context`, and `session:after`.
|
||||
- Per-turn cache token impact is visible in normal usage surfaces via `cacheRead` and `cacheWrite` (for example `/usage full` and session usage summaries).
|
||||
- For Anthropic, expect both `cacheRead` and `cacheWrite` when caching is active.
|
||||
- For OpenAI, expect `cacheRead` on cache hits and `cacheWrite` to remain `0`; OpenAI does not publish a separate cache-write token field.
|
||||
- If you need request tracing, log request IDs and rate-limit headers separately from cache metrics. OpenClaw's current cache-trace output is focused on prompt/session shape and normalized token usage rather than raw provider response headers.
|
||||
|
||||
## Quick troubleshooting
|
||||
|
||||
- High `cacheWrite` on most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings.
|
||||
- High `cacheWrite` on Anthropic: often means the cache breakpoint is landing on content that changes every request.
|
||||
- Low OpenAI `cacheRead`: verify the stable prefix is at the front, the repeated prefix is at least 1024 tokens, and the same `prompt_cache_key` is reused for turns that should share a cache.
|
||||
- No effect from `cacheRetention`: confirm model key matches `agents.defaults.models["provider/model"]`.
|
||||
- Bedrock Nova/Mistral requests with cache settings: expected runtime force to `none`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user