feat(diagnostics-otel): add content capture controls

Add opt-in diagnostics OTEL content capture controls, keep raw content export default-off, and guard the content-capture tests against magic truncation bounds.
This commit is contained in:
Vincent Koc
2026-04-24 16:41:28 -07:00
committed by GitHub
parent fbf8b216c6
commit d4d4a8c14e
12 changed files with 455 additions and 9 deletions

View File

@@ -1,4 +1,4 @@
a608561acecc7cfc5f16a31b7498d7a66001f6655f5a5960a68842c59b7dcaa8 config-baseline.json
2936d2ccf0c1e6e932a0e7c617b809e4b31dbb9a7d5afefbba29b229913b9e50 config-baseline.core.json
52af51e35e05d0cbaa1a79fb415f2c2fe56ad5d52a62efa9cbb9c32489d517f5 config-baseline.json
642b4e2c9891e710790313df097b4e0db75a197ec0908e9c03bdc76f5bbdf9b0 config-baseline.core.json
22d7cd6d8279146b2d79c9531a55b80b52a2c99c81338c508104729154fdd02d config-baseline.channel.json
d47a574045a47356e513ab308d7dcad9fa0b389f50e93c5cf0f820fab858e70e config-baseline.plugin.json

View File

@@ -797,6 +797,14 @@ Notes:
logs: false,
sampleRate: 1.0,
flushIntervalMs: 5000,
captureContent: {
enabled: false,
inputMessages: false,
outputMessages: false,
toolInputs: false,
toolOutputs: false,
systemPrompt: false,
},
},
cacheTrace: {
@@ -821,6 +829,7 @@ Notes:
- `otel.traces` / `otel.metrics` / `otel.logs`: enable trace, metrics, or log export.
- `otel.sampleRate`: trace sampling rate `0``1`.
- `otel.flushIntervalMs`: periodic telemetry flush interval in ms.
- `otel.captureContent`: opt-in raw content capture for OTEL span attributes. Defaults to off. Boolean `true` captures non-system message/tool content; the object form lets you enable `inputMessages`, `outputMessages`, `toolInputs`, `toolOutputs`, and `systemPrompt` explicitly.
- `cacheTrace.enabled`: log cache trace snapshots for embedded runs (default: `false`).
- `cacheTrace.filePath`: output path for cache trace JSONL (default: `$OPENCLAW_STATE_DIR/logs/cache-trace.jsonl`).
- `cacheTrace.includeMessages` / `includePrompt` / `includeSystem`: control what is included in cache trace output (all default: `true`).

View File

@@ -279,7 +279,15 @@ works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.
"metrics": true,
"logs": true,
"sampleRate": 0.2,
"flushIntervalMs": 60000
"flushIntervalMs": 60000,
"captureContent": {
"enabled": false,
"inputMessages": false,
"outputMessages": false,
"toolInputs": false,
"toolOutputs": false,
"systemPrompt": false
}
}
}
}
@@ -293,6 +301,9 @@ Notes:
counters/histograms (webhooks, queueing, session state, queue depth/wait).
- Traces/metrics can be toggled with `traces` / `metrics` (default: on). Traces
include model usage spans plus webhook/message processing spans when enabled.
- Raw model/tool content is not exported by default. Use
`diagnostics.otel.captureContent` only when your collector and retention policy
are approved for prompt, response, tool, or system prompt text.
- Set `headers` when your collector requires auth.
- Environment variables supported: `OTEL_EXPORTER_OTLP_ENDPOINT`,
`OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_PROTOCOL`.
@@ -341,8 +352,17 @@ Queues + sessions:
- `openclaw.model.usage`
- `openclaw.channel`, `openclaw.provider`, `openclaw.model`
- `openclaw.sessionKey`, `openclaw.sessionId`
- `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
- `openclaw.run`
- `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`,
`openclaw.model`, `openclaw.errorCategory`
- `openclaw.model.call`
- `gen_ai.system`, `gen_ai.request.model`, `gen_ai.operation.name`,
`openclaw.provider`, `openclaw.model`, `openclaw.api`,
`openclaw.transport`
- `openclaw.tool.execution`
- `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`,
`openclaw.tool.params.*`
- `openclaw.webhook.processed`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
- `openclaw.webhook.error`
@@ -350,11 +370,13 @@ Queues + sessions:
`openclaw.error`
- `openclaw.message.processed`
- `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`,
`openclaw.messageId`, `openclaw.sessionKey`, `openclaw.sessionId`,
`openclaw.reason`
`openclaw.messageId`, `openclaw.reason`
- `openclaw.session.stuck`
- `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`,
`openclaw.sessionKey`, `openclaw.sessionId`
- `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
When content capture is explicitly enabled, model/tool spans can also include
bounded, redacted `openclaw.content.*` attributes for the specific content
classes you opted into.
### Sampling + flushing