docs: split OpenTelemetry export into its own page under gateway

Logging.md had grown to 487 lines with ~300 lines dedicated to
OpenTelemetry export — wire protocol, full metric/span catalog, env
vars, captureContent shape, sampling, the diagnostic event catalog,
and protocol notes — leaving the genuine logging overview buried
behind exporter reference material.

Move the OTEL surface to a dedicated page and slim logging.md to a
focused logs overview:

- Add docs/gateway/opentelemetry.md (OpenTelemetry export). Same
  content reorganized: how it fits together, quick start, signals,
  configuration reference + env vars table, privacy/captureContent,
  sampling/flushing, full metric and span catalog, diagnostic event
  catalog, no-exporter mode, diagnostics flags pointer, disable.
- docs/logging.md: drop the OTEL section in favor of a short
  'Diagnostics and OpenTelemetry' summary that cross-links the new
  page and the diagnostics-flags page. Drops 273 lines net. Also
  drops the redundant body H1, retitles to 'Logging' (was 'Logging
  overview' which mismatched sidebar usage), and refreshes the
  Related list.
- docs/docs.json: insert gateway/opentelemetry into the
  'Health and diagnostics' sidebar group, reorder pages so the user-
  facing health/run pages come before exporter/internals pages, and
  put logging next to opentelemetry where readers naturally
  associate them.
- docs/gateway/diagnostics.md, docs/gateway/logging.md,
  docs/gateway/configuration-reference.md: cross-link the new page
  and sentence-case stale Title-Cased Related entries on
  diagnostics.md.
This commit is contained in:
Vincent Koc
2026-04-25 16:46:43 -07:00
parent a1090b6043
commit 7741dbb759
6 changed files with 341 additions and 307 deletions

View File

@@ -1436,11 +1436,12 @@
"group": "Health and diagnostics",
"pages": [
"gateway/health",
"gateway/diagnostics",
"gateway/heartbeat",
"gateway/doctor",
"gateway/logging",
"logging",
"gateway/opentelemetry",
"gateway/logging",
"gateway/diagnostics",
"gateway/troubleshooting"
]
},

View File

@@ -909,7 +909,7 @@ Notes:
- `enabled`: master toggle for instrumentation output (default: `true`).
- `flags`: array of flag strings enabling targeted log output (supports wildcards like `"telegram.*"` or `"*"`).
- `stuckSessionWarnMs`: age threshold in ms for emitting stuck-session warnings while a session remains in processing state.
- `otel.enabled`: enables the OpenTelemetry export pipeline (default: `false`).
- `otel.enabled`: enables the OpenTelemetry export pipeline (default: `false`). For the full configuration, signal catalog, and privacy model, see [OpenTelemetry export](/gateway/opentelemetry).
- `otel.endpoint`: collector URL for OTel export.
- `otel.protocol`: `"http/protobuf"` (default) or `"grpc"`.
- `otel.headers`: extra HTTP/gRPC metadata headers sent with OTel export requests.

View File

@@ -129,9 +129,10 @@ diagnostic event collection:
Disabling diagnostics reduces bug-report detail. It does not affect normal
Gateway logging.
## Related docs
## Related
- [Health Checks](/gateway/health)
- [Health checks](/gateway/health)
- [Gateway CLI](/cli/gateway#gateway-diagnostics-export)
- [Gateway Protocol](/gateway/protocol#system-and-identity)
- [Gateway protocol](/gateway/protocol#system-and-identity)
- [Logging](/logging)
- [OpenTelemetry export](/gateway/opentelemetry) — separate flow for streaming diagnostics to a collector

View File

@@ -114,5 +114,6 @@ This keeps existing file logs stable while making interactive output scannable.
## Related
- [Logging overview](/logging)
- [Logging](/logging)
- [OpenTelemetry export](/gateway/opentelemetry)
- [Diagnostics export](/gateway/diagnostics)

View File

@@ -0,0 +1,304 @@
---
summary: "Export OpenClaw diagnostics to any OpenTelemetry collector via the diagnostics-otel plugin (OTLP/HTTP)"
title: "OpenTelemetry export"
read_when:
- You want to send OpenClaw model usage, message flow, or session metrics to an OpenTelemetry collector
- You are wiring traces, metrics, or logs into Grafana, Datadog, Honeycomb, New Relic, Tempo, or another OTLP backend
- You need the exact metric names, span names, or attribute shapes to build dashboards or alerts
---
OpenClaw exports diagnostics through the bundled `diagnostics-otel` plugin
using **OTLP/HTTP (protobuf)**. Any collector or backend that accepts OTLP/HTTP
works without code changes. For local file logs and how to read them, see
[Logging](/logging).
## How it fits together
- **Diagnostics events** are structured, in-process records emitted by the
Gateway and bundled plugins for model runs, message flow, sessions, queues,
and exec.
- **`diagnostics-otel` plugin** subscribes to those events and exports them as
OpenTelemetry **metrics**, **traces**, and **logs** over OTLP/HTTP.
- Exporters only attach when both the diagnostics surface and the plugin are
enabled, so the in-process cost stays near zero by default.
## Quick start
```json5
{
plugins: {
allow: ["diagnostics-otel"],
entries: {
"diagnostics-otel": { enabled: true },
},
},
diagnostics: {
enabled: true,
otel: {
enabled: true,
endpoint: "http://otel-collector:4318",
protocol: "http/protobuf",
serviceName: "openclaw-gateway",
traces: true,
metrics: true,
logs: true,
sampleRate: 0.2,
flushIntervalMs: 60000,
},
},
}
```
You can also enable the plugin from the CLI:
```bash
openclaw plugins enable diagnostics-otel
```
<Note>
`protocol` currently supports `http/protobuf` only. `grpc` is ignored.
</Note>
## Signals exported
| Signal | What goes in it |
| ----------- | --------------------------------------------------------------------------------------------------------------------------------- |
| **Metrics** | Counters and histograms for token usage, cost, run duration, message flow, queue lanes, session state, exec, and memory pressure. |
| **Traces** | Spans for model usage, model calls, tool execution, exec, webhook/message processing, context assembly, and tool loops. |
| **Logs** | Structured `logging.file` records exported over OTLP when `diagnostics.otel.logs` is enabled. |
Toggle `traces`, `metrics`, and `logs` independently. All three default to on
when `diagnostics.otel.enabled` is true.
## Configuration reference
```json5
{
diagnostics: {
enabled: true,
otel: {
enabled: true,
endpoint: "http://otel-collector:4318",
protocol: "http/protobuf", // grpc is ignored
serviceName: "openclaw-gateway",
headers: { "x-collector-token": "..." },
traces: true,
metrics: true,
logs: true,
sampleRate: 0.2, // root-span sampler, 0.0..1.0
flushIntervalMs: 60000, // metric export interval (min 1000ms)
captureContent: {
enabled: false,
inputMessages: false,
outputMessages: false,
toolInputs: false,
toolOutputs: false,
systemPrompt: false,
},
},
},
}
```
### Environment variables
| Variable | Purpose |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Override `diagnostics.otel.endpoint`. If the value already contains `/v1/traces`, `/v1/metrics`, or `/v1/logs`, it is used as-is. |
| `OTEL_SERVICE_NAME` | Override `diagnostics.otel.serviceName`. |
| `OTEL_EXPORTER_OTLP_PROTOCOL` | Override the wire protocol (only `http/protobuf` is honored today). |
| `OTEL_SEMCONV_STABILITY_OPT_IN` | Set to `gen_ai_latest_experimental` to emit the latest experimental GenAI span attribute (`gen_ai.provider.name`) instead of the legacy `gen_ai.system`. GenAI metrics always use bounded, low-cardinality semantic attributes regardless. |
| `OPENCLAW_OTEL_PRELOADED` | Set to `1` when another preload or host process already registered the global OpenTelemetry SDK. The plugin then skips its own NodeSDK lifecycle but still wires diagnostic listeners and honors `traces`/`metrics`/`logs`. |
## Privacy and content capture
Raw model/tool content is **not** exported by default. Spans carry bounded
identifiers (channel, provider, model, error category, hash-only request ids)
and never include prompt text, response text, tool inputs, tool outputs, or
session keys.
Set `diagnostics.otel.captureContent.*` to `true` only when your collector and
retention policy are approved for prompt, response, tool, or system-prompt
text. Each subkey is opt-in independently:
- `inputMessages` — user prompt content.
- `outputMessages` — model response content.
- `toolInputs` — tool argument payloads.
- `toolOutputs` — tool result payloads.
- `systemPrompt` — assembled system/developer prompt.
When any subkey is enabled, model and tool spans get bounded, redacted
`openclaw.content.*` attributes for that class only.
## Sampling and flushing
- **Traces:** `diagnostics.otel.sampleRate` (root-span only, `0.0` drops all,
`1.0` keeps all).
- **Metrics:** `diagnostics.otel.flushIntervalMs` (minimum `1000`).
- **Logs:** OTLP logs respect `logging.level` (file log level). Console
redaction does **not** apply to OTLP logs. High-volume installs should
prefer OTLP collector sampling/filtering over local sampling.
## Exported metrics
### Model usage
- `openclaw.tokens` (counter, attrs: `openclaw.token`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `openclaw.cost.usd` (counter, attrs: `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `openclaw.run.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric, attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`)
- `gen_ai.client.operation.duration` (histogram, seconds, GenAI semantic-conventions metric, attrs: `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
### Message flow
- `openclaw.webhook.received` (counter, attrs: `openclaw.channel`, `openclaw.webhook`)
- `openclaw.webhook.error` (counter, attrs: `openclaw.channel`, `openclaw.webhook`)
- `openclaw.webhook.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.webhook`)
- `openclaw.message.queued` (counter, attrs: `openclaw.channel`, `openclaw.source`)
- `openclaw.message.processed` (counter, attrs: `openclaw.channel`, `openclaw.outcome`)
- `openclaw.message.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.outcome`)
- `openclaw.message.delivery.started` (counter, attrs: `openclaw.channel`, `openclaw.delivery.kind`)
- `openclaw.message.delivery.duration_ms` (histogram, attrs: `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`, `openclaw.errorCategory`)
### Queues and sessions
- `openclaw.queue.lane.enqueue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.lane.dequeue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.depth` (histogram, attrs: `openclaw.lane` or `openclaw.channel=heartbeat`)
- `openclaw.queue.wait_ms` (histogram, attrs: `openclaw.lane`)
- `openclaw.session.state` (counter, attrs: `openclaw.state`, `openclaw.reason`)
- `openclaw.session.stuck` (counter, attrs: `openclaw.state`)
- `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`)
- `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`)
### Exec
- `openclaw.exec.duration_ms` (histogram, attrs: `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`)
### Diagnostics internals (memory and tool loop)
- `openclaw.memory.heap_used_bytes` (histogram, attrs: `openclaw.memory.kind`)
- `openclaw.memory.rss_bytes` (histogram)
- `openclaw.memory.pressure` (counter, attrs: `openclaw.memory.level`)
- `openclaw.tool.loop.iterations` (counter, attrs: `openclaw.toolName`, `openclaw.outcome`)
- `openclaw.tool.loop.duration_ms` (histogram, attrs: `openclaw.toolName`, `openclaw.outcome`)
## Exported spans
- `openclaw.model.usage`
- `openclaw.channel`, `openclaw.provider`, `openclaw.model`
- `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
- `gen_ai.system` by default, or `gen_ai.provider.name` when the latest GenAI semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`, `gen_ai.usage.*`
- `openclaw.run`
- `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`, `openclaw.errorCategory`
- `openclaw.model.call`
- `gen_ai.system` by default, or `gen_ai.provider.name` when the latest GenAI semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`, `openclaw.provider`, `openclaw.model`, `openclaw.api`, `openclaw.transport`
- `openclaw.provider.request_id_hash` (bounded SHA-based hash of the upstream provider request id; raw ids are not exported)
- `openclaw.tool.execution`
- `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`, `openclaw.tool.params.*`
- `openclaw.exec`
- `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`, `openclaw.exec.command_length`, `openclaw.exec.exit_code`, `openclaw.exec.timed_out`
- `openclaw.webhook.processed`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
- `openclaw.webhook.error`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`, `openclaw.error`
- `openclaw.message.processed`
- `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`, `openclaw.messageId`, `openclaw.reason`
- `openclaw.message.delivery`
- `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`, `openclaw.errorCategory`, `openclaw.delivery.result_count`
- `openclaw.session.stuck`
- `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
- `openclaw.context.assembled`
- `openclaw.prompt.size`, `openclaw.history.size`, `openclaw.context.tokens`, `openclaw.errorCategory` (no prompt, history, response, or session-key content)
- `openclaw.tool.loop`
- `openclaw.toolName`, `openclaw.outcome`, `openclaw.iterations`, `openclaw.errorCategory` (no loop messages, params, or tool output)
- `openclaw.memory.pressure`
- `openclaw.memory.level`, `openclaw.memory.heap_used_bytes`, `openclaw.memory.rss_bytes`
When content capture is explicitly enabled, model and tool spans can also
include bounded, redacted `openclaw.content.*` attributes for the specific
content classes you opted into.
## Diagnostic event catalog
The events below back the metrics and spans above. Plugins can also subscribe
to them directly without OTLP export.
**Model usage**
- `model.usage` — tokens, cost, duration, context, provider/model/channel,
session ids. `usage` is provider/turn accounting for cost and telemetry;
`context.used` is the current prompt/context snapshot and can be lower than
provider `usage.total` when cached input or tool-loop calls are involved.
**Message flow**
- `webhook.received` / `webhook.processed` / `webhook.error`
- `message.queued` / `message.processed`
- `message.delivery.started` / `message.delivery.completed` / `message.delivery.error`
**Queue and session**
- `queue.lane.enqueue` / `queue.lane.dequeue`
- `session.state` / `session.stuck`
- `run.attempt`
- `diagnostic.heartbeat` (aggregate counters: webhooks/queue/session)
**Exec**
- `exec.process.completed` — terminal outcome, duration, target, mode, exit
code, and failure kind. Command text and working directories are not
included.
## Without an exporter
You can keep diagnostics events available to plugins or custom sinks without
running `diagnostics-otel`:
```json5
{
diagnostics: { enabled: true },
}
```
For targeted debug output without raising `logging.level`, use diagnostics
flags. Flags are case-insensitive and support wildcards (e.g. `telegram.*` or
`*`):
```json5
{
diagnostics: { flags: ["telegram.http"] },
}
```
Or as a one-off env override:
```bash
OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload openclaw gateway
```
Flag output goes to the standard log file (`logging.file`) and is still
redacted by `logging.redactSensitive`. Full guide:
[Diagnostics flags](/diagnostics/flags).
## Disable
```json5
{
diagnostics: { otel: { enabled: false } },
}
```
You can also leave `diagnostics-otel` out of `plugins.allow`, or run
`openclaw plugins disable diagnostics-otel`.
## Related
- [Logging](/logging) — file logs, console output, CLI tailing, and the Control UI Logs tab
- [Gateway logging internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Diagnostics flags](/diagnostics/flags) — targeted debug-log flags
- [Diagnostics export](/gateway/diagnostics) — operator support-bundle tool (separate from OTEL export)
- [Configuration reference](/gateway/configuration-reference#diagnostics) — full `diagnostics.*` field reference

View File

@@ -1,14 +1,12 @@
---
summary: "Logging overview: file logs, console output, CLI tailing, and the Control UI"
summary: "File logs, console output, CLI tailing, and the Control UI Logs tab"
read_when:
- You need a beginner-friendly overview of logging
- You want to configure log levels or formats
- You need a beginner-friendly overview of OpenClaw logging
- You want to configure log levels, formats, or redaction
- You are troubleshooting and need to find logs quickly
title: "Logging overview"
title: "Logging"
---
# Logging
OpenClaw has two main log surfaces:
- **File logs** (JSON lines) written by the Gateway.
@@ -171,308 +169,35 @@ Tool summaries can redact sensitive tokens before they hit the console:
Redaction affects **console output only** and does not alter file logs.
## Diagnostics + OpenTelemetry
## Diagnostics and OpenTelemetry
Diagnostics are structured, machine-readable events for model runs **and**
Diagnostics are structured, machine-readable events for model runs and
message-flow telemetry (webhooks, queueing, session state). They do **not**
replace logs; they exist to feed metrics, traces, and other exporters.
replace logs they feed metrics, traces, and exporters. Events are emitted
in-process whether or not you export them.
Diagnostics events are emitted in-process, but exporters only attach when
diagnostics + the exporter plugin are enabled.
Two adjacent surfaces:
### OpenTelemetry vs OTLP
- **OpenTelemetry export** — send metrics, traces, and logs over OTLP/HTTP to
any OpenTelemetry-compatible collector or backend (Grafana, Datadog,
Honeycomb, New Relic, Tempo, etc.). Full configuration, signal catalog,
metric/span names, env vars, and privacy model live on a dedicated page:
[OpenTelemetry export](/gateway/opentelemetry).
- **Diagnostics flags** — targeted debug-log flags that route extra logs to
`logging.file` without raising `logging.level`. Flags are case-insensitive
and support wildcards (`telegram.*`, `*`). Configure under `diagnostics.flags`
or via the `OPENCLAW_DIAGNOSTICS=...` env override. Full guide:
[Diagnostics flags](/diagnostics/flags).
- **OpenTelemetry (OTel)**: the data model + SDKs for traces, metrics, and logs.
- **OTLP**: the wire protocol used to export OTel data to a collector/backend.
- OpenClaw exports via **OTLP/HTTP (protobuf)** today.
To enable diagnostics events for plugins or custom sinks without OTLP export:
### Signals exported
- **Metrics**: counters + histograms (token usage, message flow, queueing).
- **Traces**: spans for model usage + webhook/message processing.
- **Logs**: exported over OTLP when `diagnostics.otel.logs` is enabled. Log
volume can be high; keep `logging.level` and exporter filters in mind.
### Diagnostic event catalog
Model usage:
- `model.usage`: tokens, cost, duration, context, provider/model/channel, session ids.
`usage` is provider/turn accounting for cost and telemetry; `context.used`
is the current prompt/context snapshot and can be lower than provider
`usage.total` when cached input or tool-loop calls are involved.
Message flow:
- `webhook.received`: webhook ingress per channel.
- `webhook.processed`: webhook handled + duration.
- `webhook.error`: webhook handler errors.
- `message.queued`: message enqueued for processing.
- `message.processed`: outcome + duration + optional error.
- `message.delivery.started`: outbound delivery attempt started.
- `message.delivery.completed`: outbound delivery attempt finished + duration/result count.
- `message.delivery.error`: outbound delivery attempt failed + duration/bounded error category.
Queue + session:
- `queue.lane.enqueue`: command queue lane enqueue + depth.
- `queue.lane.dequeue`: command queue lane dequeue + wait time.
- `session.state`: session state transition + reason.
- `session.stuck`: session stuck warning + age.
- `run.attempt`: run retry/attempt metadata.
- `diagnostic.heartbeat`: aggregate counters (webhooks/queue/session).
Exec:
- `exec.process.completed`: terminal exec process outcome, duration, target, mode,
exit code, and failure kind. Command text and working directories are not
included.
### Enable diagnostics (no exporter)
Use this if you want diagnostics events available to plugins or custom sinks:
```json
```json5
{
"diagnostics": {
"enabled": true
}
diagnostics: { enabled: true },
}
```
### Diagnostics flags (targeted logs)
Use flags to turn on extra, targeted debug logs without raising `logging.level`.
Flags are case-insensitive and support wildcards (e.g. `telegram.*` or `*`).
```json
{
"diagnostics": {
"flags": ["telegram.http"]
}
}
```
Env override (one-off):
```
OPENCLAW_DIAGNOSTICS=telegram.http,telegram.payload
```
Notes:
- Flag logs go to the standard log file (same as `logging.file`).
- Output is still redacted according to `logging.redactSensitive`.
- Full guide: [/diagnostics/flags](/diagnostics/flags).
### Export to OpenTelemetry
Diagnostics can be exported via the `diagnostics-otel` plugin (OTLP/HTTP). This
works with any OpenTelemetry collector/backend that accepts OTLP/HTTP.
```json
{
"plugins": {
"allow": ["diagnostics-otel"],
"entries": {
"diagnostics-otel": {
"enabled": true
}
}
},
"diagnostics": {
"enabled": true,
"otel": {
"enabled": true,
"endpoint": "http://otel-collector:4318",
"protocol": "http/protobuf",
"serviceName": "openclaw-gateway",
"traces": true,
"metrics": true,
"logs": true,
"sampleRate": 0.2,
"flushIntervalMs": 60000,
"captureContent": {
"enabled": false,
"inputMessages": false,
"outputMessages": false,
"toolInputs": false,
"toolOutputs": false,
"systemPrompt": false
}
}
}
}
```
Notes:
- You can also enable the plugin with `openclaw plugins enable diagnostics-otel`.
- `protocol` currently supports `http/protobuf` only. `grpc` is ignored.
- Metrics include token usage, cost, context size, run duration, and message-flow
counters/histograms (webhooks, queueing, session state, queue depth/wait),
plus GenAI token usage and model-call duration histograms.
- Traces/metrics can be toggled with `traces` / `metrics` (default: on). Traces
include model usage spans plus webhook/message processing spans when enabled.
- Raw model/tool content is not exported by default. Use
`diagnostics.otel.captureContent` only when your collector and retention policy
are approved for prompt, response, tool, or system prompt text.
- Set `headers` when your collector requires auth.
- Environment variables supported: `OTEL_EXPORTER_OTLP_ENDPOINT`,
`OTEL_SERVICE_NAME`, `OTEL_EXPORTER_OTLP_PROTOCOL`.
- Set `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` to emit the
latest experimental GenAI provider span attribute (`gen_ai.provider.name`)
instead of the legacy span attribute (`gen_ai.system`). GenAI metrics always
use bounded, low-cardinality semantic attributes.
- Set `OPENCLAW_OTEL_PRELOADED=1` when another preload or host process already
registered the global OpenTelemetry SDK. In that mode the plugin does not start
or shut down its own SDK, but it still wires OpenClaw diagnostic listeners and
honors `diagnostics.otel.traces`, `metrics`, and `logs`.
### Exported metrics (names + types)
Model usage:
- `openclaw.tokens` (counter, attrs: `openclaw.token`, `openclaw.channel`,
`openclaw.provider`, `openclaw.model`)
- `openclaw.cost.usd` (counter, attrs: `openclaw.channel`, `openclaw.provider`,
`openclaw.model`)
- `openclaw.run.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.provider`, `openclaw.model`)
- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`,
`openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric,
attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`,
`gen_ai.operation.name`, `gen_ai.request.model`)
- `gen_ai.client.operation.duration` (histogram, seconds, GenAI
semantic-conventions metric, attrs: `gen_ai.provider.name`,
`gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
Message flow:
- `openclaw.webhook.received` (counter, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.webhook.error` (counter, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.webhook.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.webhook`)
- `openclaw.message.queued` (counter, attrs: `openclaw.channel`,
`openclaw.source`)
- `openclaw.message.processed` (counter, attrs: `openclaw.channel`,
`openclaw.outcome`)
- `openclaw.message.duration_ms` (histogram, attrs: `openclaw.channel`,
`openclaw.outcome`)
- `openclaw.message.delivery.started` (counter, attrs: `openclaw.channel`,
`openclaw.delivery.kind`)
- `openclaw.message.delivery.duration_ms` (histogram, attrs:
`openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
`openclaw.errorCategory`)
Queues + sessions:
- `openclaw.queue.lane.enqueue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.lane.dequeue` (counter, attrs: `openclaw.lane`)
- `openclaw.queue.depth` (histogram, attrs: `openclaw.lane` or
`openclaw.channel=heartbeat`)
- `openclaw.queue.wait_ms` (histogram, attrs: `openclaw.lane`)
- `openclaw.session.state` (counter, attrs: `openclaw.state`, `openclaw.reason`)
- `openclaw.session.stuck` (counter, attrs: `openclaw.state`)
- `openclaw.session.stuck_age_ms` (histogram, attrs: `openclaw.state`)
- `openclaw.run.attempt` (counter, attrs: `openclaw.attempt`)
Exec:
- `openclaw.exec.duration_ms` (histogram, attrs: `openclaw.exec.target`,
`openclaw.exec.mode`, `openclaw.outcome`, `openclaw.failureKind`)
Diagnostics internals (memory + tool loop):
- `openclaw.memory.heap_used_bytes` (histogram, attrs: `openclaw.memory.kind`)
- `openclaw.memory.rss_bytes` (histogram)
- `openclaw.memory.pressure` (counter, attrs: `openclaw.memory.level`)
- `openclaw.tool.loop.iterations` (counter, attrs: `openclaw.toolName`,
`openclaw.outcome`)
- `openclaw.tool.loop.duration_ms` (histogram, attrs: `openclaw.toolName`,
`openclaw.outcome`)
### Exported spans (names + key attributes)
- `openclaw.model.usage`
- `openclaw.channel`, `openclaw.provider`, `openclaw.model`
- `openclaw.tokens.*` (input/output/cache_read/cache_write/total)
- `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`, `gen_ai.usage.*`
- `openclaw.run`
- `openclaw.outcome`, `openclaw.channel`, `openclaw.provider`,
`openclaw.model`, `openclaw.errorCategory`
- `openclaw.model.call`
- `gen_ai.system` by default, or `gen_ai.provider.name` when latest GenAI
semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`,
`openclaw.provider`, `openclaw.model`, `openclaw.api`,
`openclaw.transport`, `openclaw.provider.request_id_hash` (bounded
SHA-based hash of the upstream provider request id; raw ids are not
exported)
- `openclaw.tool.execution`
- `gen_ai.tool.name`, `openclaw.toolName`, `openclaw.errorCategory`,
`openclaw.tool.params.*`
- `openclaw.exec`
- `openclaw.exec.target`, `openclaw.exec.mode`, `openclaw.outcome`,
`openclaw.failureKind`, `openclaw.exec.command_length`,
`openclaw.exec.exit_code`, `openclaw.exec.timed_out`
- `openclaw.webhook.processed`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`
- `openclaw.webhook.error`
- `openclaw.channel`, `openclaw.webhook`, `openclaw.chatId`,
`openclaw.error`
- `openclaw.message.processed`
- `openclaw.channel`, `openclaw.outcome`, `openclaw.chatId`,
`openclaw.messageId`, `openclaw.reason`
- `openclaw.message.delivery`
- `openclaw.channel`, `openclaw.delivery.kind`, `openclaw.outcome`,
`openclaw.errorCategory`, `openclaw.delivery.result_count`
- `openclaw.session.stuck`
- `openclaw.state`, `openclaw.ageMs`, `openclaw.queueDepth`
- `openclaw.context.assembled`
- `openclaw.prompt.size`, `openclaw.history.size`,
`openclaw.context.tokens`, `openclaw.errorCategory` (no prompt,
history, response, or session-key content)
- `openclaw.tool.loop`
- `openclaw.toolName`, `openclaw.outcome`, `openclaw.iterations`,
`openclaw.errorCategory` (no loop messages, params, or tool output)
- `openclaw.memory.pressure`
- `openclaw.memory.level`, `openclaw.memory.heap_used_bytes`,
`openclaw.memory.rss_bytes`
When content capture is explicitly enabled, model/tool spans can also include
bounded, redacted `openclaw.content.*` attributes for the specific content
classes you opted into.
### Sampling + flushing
- Trace sampling: `diagnostics.otel.sampleRate` (0.01.0, root spans only).
- Metric export interval: `diagnostics.otel.flushIntervalMs` (min 1000ms).
### Protocol notes
- OTLP/HTTP endpoints can be set via `diagnostics.otel.endpoint` or
`OTEL_EXPORTER_OTLP_ENDPOINT`.
- If the endpoint already contains `/v1/traces` or `/v1/metrics`, it is used as-is.
- If the endpoint already contains `/v1/logs`, it is used as-is for logs.
- `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` controls only the
GenAI span provider attribute shape. Existing dashboards that read
`gen_ai.system` can keep the default until they migrate.
- `OPENCLAW_OTEL_PRELOADED=1` reuses an externally registered OpenTelemetry SDK
for traces/metrics instead of starting a plugin-owned NodeSDK.
- `diagnostics.otel.logs` enables OTLP log export for the main logger output.
### Log export behavior
- OTLP logs use the same structured records written to `logging.file`.
- Respect `logging.level` (file log level). Console redaction does **not** apply
to OTLP logs.
- High-volume installs should prefer OTLP collector sampling/filtering.
For OTLP export to a collector, see [OpenTelemetry export](/gateway/opentelemetry).
## Troubleshooting tips
@@ -483,5 +208,7 @@ classes you opted into.
## Related
- [Gateway Logging Internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Diagnostics](/gateway/configuration-reference#diagnostics) — OpenTelemetry export and cache trace config
- [OpenTelemetry export](/gateway/opentelemetry) — OTLP/HTTP export, metric/span catalog, privacy model
- [Diagnostics flags](/diagnostics/flags) — targeted debug-log flags
- [Gateway logging internals](/gateway/logging) — WS log styles, subsystem prefixes, and console capture
- [Configuration reference](/gateway/configuration-reference#diagnostics) — full `diagnostics.*` field reference