feat(diagnostics-otel): add content capture controls

Add opt-in diagnostics OTEL content capture controls, keep raw content export default-off, and guard the content-capture tests against magic truncation bounds.
2026-05-06 12:30:44 +00:00 · 2026-04-24 16:41:28 -07:00
parent fbf8b216c6
commit d4d4a8c14e
12 changed files with 455 additions and 9 deletions
--- a/docs/.generated/config-baseline.sha256
+++ b/docs/.generated/config-baseline.sha256
@@ -1,4 +1,4 @@
-a608561acecc7cfc5f16a31b7498d7a66001f6655f5a5960a68842c59b7dcaa8  config-baseline.json
-2936d2ccf0c1e6e932a0e7c617b809e4b31dbb9a7d5afefbba29b229913b9e50  config-baseline.core.json
+52af51e35e05d0cbaa1a79fb415f2c2fe56ad5d52a62efa9cbb9c32489d517f5  config-baseline.json
+642b4e2c9891e710790313df097b4e0db75a197ec0908e9c03bdc76f5bbdf9b0  config-baseline.core.json
 22d7cd6d8279146b2d79c9531a55b80b52a2c99c81338c508104729154fdd02d  config-baseline.channel.json
 d47a574045a47356e513ab308d7dcad9fa0b389f50e93c5cf0f820fab858e70e  config-baseline.plugin.json
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -797,6 +797,14 @@ Notes:
      logs: false,
      sampleRate: 1.0,
      flushIntervalMs: 5000,
+      captureContent: {
+        enabled: false,
+        inputMessages: false,
+        outputMessages: false,
+        toolInputs: false,
+        toolOutputs: false,
+        systemPrompt: false,
+      },
    },

    cacheTrace: {
@@ -821,6 +829,7 @@ Notes:
 - `otel.traces` / `otel.metrics` / `otel.logs`: enable trace, metrics, or log export.
 - `otel.sampleRate`: trace sampling rate `0`–`1`.
 - `otel.flushIntervalMs`: periodic telemetry flush interval in ms.
+- `otel.captureContent`: opt-in raw content capture for OTEL span attributes. Defaults to off. Boolean `true` captures non-system message/tool content; the object form lets you enable `inputMessages`, `outputMessages`, `toolInputs`, `toolOutputs`, and `systemPrompt` explicitly.
 - `cacheTrace.enabled`: log cache trace snapshots for embedded runs (default: `false`).
 - `cacheTrace.filePath`: output path for cache trace JSONL (default: `$OPENCLAW_STATE_DIR/logs/cache-trace.jsonl`).
 - `cacheTrace.includeMessages` / `includePrompt` / `includeSystem`: control what is included in cache trace output (all default: `true`).
--- a/docs/logging.md
+++ b/docs/logging.md
@@ -279,7 +279,15 @@ works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.
      "metrics": true,
      "logs": true,
      "sampleRate": 0.2,
-      "flushIntervalMs": 60000
+      "flushIntervalMs": 60000,
+      "captureContent": {
+        "enabled": false,
+        "inputMessages": false,
+        "outputMessages": false,
+        "toolInputs": false,
+        "toolOutputs": false,
+        "systemPrompt": false
+      }
    }
  }
 }
@@ -293,6 +301,9 @@ Notes:
  counters/histograms (webhooks, queueing, session state, queue depth/wait).
 - Traces/metrics can be toggled with `traces` / `metrics` (default: on). Traces
  include model usage spans plus webhook/message processing spans when enabled.
+- Raw model/tool content is not exported by default. Use
+  `diagnostics.otel.captureContent` only when your collector and retention policy
+  are approved for prompt, response, tool, or system prompt text.
 - Set `headers` when your collector requires auth.
 - Environment variables supported: `OTEL_EXPORTER_OTLP_ENDPOINT`,
  `OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_PROTOCOL`.
@@ -341,8 +352,17 @@ Queues + sessions:

 - `openclaw.model.usage`
  - `openclaw.channel`, `openclaw.provider`, `openclaw.model`
-  - `openclaw.sessionKey`, `openclaw.sessionId`
  - `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
+- `openclaw.run`
+  - `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`,
+    `openclaw.model`, `openclaw.errorCategory`
+- `openclaw.model.call`
+  - `gen_ai.system`, `gen_ai.request.model`, `gen_ai.operation.name`,
+    `openclaw.provider`, `openclaw.model`, `openclaw.api`,
+    `openclaw.transport`
+- `openclaw.tool.execution`
+  - `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`,
+    `openclaw.tool.params.*`
 - `openclaw.webhook.processed`
  - `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
 - `openclaw.webhook.error`
@@ -350,11 +370,13 @@ Queues + sessions:
    `openclaw.error`
 - `openclaw.message.processed`
  - `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`,
-    `openclaw.messageId`, `openclaw.sessionKey`, `openclaw.sessionId`,
-    `openclaw.reason`
+    `openclaw.messageId`, `openclaw.reason`
 - `openclaw.session.stuck`
-  - `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`,
-    `openclaw.sessionKey`, `openclaw.sessionId`
+  - `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
+
+When content capture is explicitly enabled, model/tool spans can also include
+bounded, redacted `openclaw.content.*` attributes for the specific content
+classes you opted into.

 ### Sampling + flushing