Files
openclaw/docs/gateway/prometheus.md
2026-05-02 08:01:37 -07:00

11 KiB

summary, title, sidebarTitle, read_when
summary title sidebarTitle read_when
Expose OpenClaw diagnostics as Prometheus text metrics through the diagnostics-prometheus plugin Prometheus metrics Prometheus
You want Prometheus, Grafana, VictoriaMetrics, or another scraper to collect OpenClaw Gateway metrics
You need the Prometheus metric names and label policy for dashboards or alerts
You want metrics without running an OpenTelemetry collector

OpenClaw can expose diagnostics metrics through the official diagnostics-prometheus plugin. It listens to trusted internal diagnostics and renders a Prometheus text endpoint at:

GET /api/diagnostics/prometheus

Content type is text/plain; version=0.0.4; charset=utf-8, the standard Prometheus exposition format.

The route uses Gateway authentication (operator scope). Do not expose it as a public unauthenticated `/metrics` endpoint. Scrape it through the same auth path you use for other operator APIs.

For traces, logs, OTLP push, and OpenTelemetry GenAI semantic attributes, see OpenTelemetry export.

Quick start

```bash openclaw plugins install clawhub:@openclaw/diagnostics-prometheus ``` ```json5 { plugins: { allow: ["diagnostics-prometheus"], entries: { "diagnostics-prometheus": { enabled: true }, }, }, diagnostics: { enabled: true, }, } ``` ```bash openclaw plugins enable diagnostics-prometheus ``` The HTTP route is registered at plugin startup, so reload after enabling. Send the same gateway auth your operator clients use:
```bash
curl -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
  http://127.0.0.1:18789/api/diagnostics/prometheus
```
```yaml # prometheus.yml scrape_configs: - job_name: openclaw scrape_interval: 30s metrics_path: /api/diagnostics/prometheus authorization: credentials_file: /etc/prometheus/openclaw-gateway-token static_configs: - targets: ["openclaw-gateway:18789"] ``` `diagnostics.enabled: true` is required. Without it, the plugin still registers the HTTP route but no diagnostic events flow into the exporter, so the response is empty.

Metrics exported

Metric Type Labels
openclaw_run_completed_total counter channel, model, outcome, provider, trigger
openclaw_run_duration_seconds histogram channel, model, outcome, provider, trigger
openclaw_model_call_total counter api, error_category, model, outcome, provider, transport
openclaw_model_call_duration_seconds histogram api, error_category, model, outcome, provider, transport
openclaw_model_tokens_total counter agent, channel, model, provider, token_type
openclaw_gen_ai_client_token_usage histogram model, provider, token_type
openclaw_model_cost_usd_total counter agent, channel, model, provider
openclaw_tool_execution_total counter error_category, outcome, params_kind, tool
openclaw_tool_execution_duration_seconds histogram error_category, outcome, params_kind, tool
openclaw_harness_run_total counter channel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_harness_run_duration_seconds histogram channel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_message_processed_total counter channel, outcome, reason
openclaw_message_processed_duration_seconds histogram channel, outcome, reason
openclaw_message_delivery_total counter channel, delivery_kind, error_category, outcome
openclaw_message_delivery_duration_seconds histogram channel, delivery_kind, error_category, outcome
openclaw_queue_lane_size gauge lane
openclaw_queue_lane_wait_seconds histogram lane
openclaw_session_state_total counter reason, state
openclaw_session_queue_depth gauge state
openclaw_memory_bytes gauge kind
openclaw_memory_rss_bytes histogram none
openclaw_memory_pressure_total counter level, reason
openclaw_telemetry_exporter_total counter exporter, reason, signal, status
openclaw_prometheus_series_dropped_total counter none

Label policy

Prometheus labels stay bounded and low-cardinality. The exporter does not emit raw diagnostic identifiers such as `runId`, `sessionKey`, `sessionId`, `callId`, `toolCallId`, message IDs, chat IDs, or provider request IDs.
Label values are redacted and must match OpenClaw's low-cardinality character policy. Values that fail the policy are replaced with `unknown`, `other`, or `none`, depending on the metric.
The exporter caps retained time series in memory at **2048** series across counters, gauges, and histograms combined. New series beyond that cap are dropped, and `openclaw_prometheus_series_dropped_total` increments by one each time.
Watch this counter as a hard signal that an attribute upstream is leaking high-cardinality values. The exporter never lifts the cap automatically; if it climbs, fix the source rather than disabling the cap.
- prompt text, response text, tool inputs, tool outputs, system prompts - raw provider request IDs (only bounded hashes, where applicable, on spans — never on metrics) - session keys and session IDs - hostnames, file paths, secret values

PromQL recipes

# Tokens per minute, split by provider
sum by (provider) (rate(openclaw_model_tokens_total[1m]))

# Spend (USD) over the last hour, by model
sum by (model) (increase(openclaw_model_cost_usd_total[1h]))

# 95th percentile model run duration
histogram_quantile(
  0.95,
  sum by (le, provider, model)
    (rate(openclaw_run_duration_seconds_bucket[5m]))
)

# Queue wait time SLO (95p under 2s)
histogram_quantile(
  0.95,
  sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
) < 2

# Dropped Prometheus series (cardinality alarm)
increase(openclaw_prometheus_series_dropped_total[15m]) > 0
Prefer `gen_ai_client_token_usage` for cross-provider dashboards: it follows the OpenTelemetry GenAI semantic conventions and is consistent with metrics from non-OpenClaw GenAI services.

Choosing between Prometheus and OpenTelemetry export

OpenClaw supports both surfaces independently. You can run either, both, or neither.

- **Pull** model: Prometheus scrapes `/api/diagnostics/prometheus`. - No external collector required. - Authenticated through normal Gateway auth. - Surface is metrics only (no traces or logs). - Best for stacks already standardized on Prometheus + Grafana. - **Push** model: OpenClaw sends OTLP/HTTP to a collector or OTLP-compatible backend. - Surface includes metrics, traces, and logs. - Bridges to Prometheus through an OpenTelemetry Collector (`prometheus` or `prometheusremotewrite` exporter) when you need both. - See [OpenTelemetry export](/gateway/opentelemetry) for the full catalog.

Troubleshooting

- Check `diagnostics.enabled: true` in config. - Confirm the plugin is enabled and loaded with `openclaw plugins list --enabled`. - Generate some traffic; counters and histograms only emit lines after at least one event. The endpoint requires the Gateway operator scope (`auth: "gateway"` with `gatewayRuntimeScopeSurface: "trusted-operator"`). Use the same token or password Prometheus uses for any other Gateway operator route. There is no public unauthenticated mode. A new attribute is exceeding the **2048**-series cap. Inspect recent metrics for an unexpectedly high-cardinality label and fix it at the source. The exporter intentionally drops new series instead of silently rewriting labels. The plugin keeps state in memory only. After a Gateway restart, counters reset to zero and gauges restart at their next reported value. Use PromQL `rate()` and `increase()` to handle resets cleanly.