diff --git a/docs/gateway/prometheus.md b/docs/gateway/prometheus.md
index 7c408aa4b33..92a4753df66 100644
--- a/docs/gateway/prometheus.md
+++ b/docs/gateway/prometheus.md
@@ -1,47 +1,84 @@
---
summary: "Expose OpenClaw diagnostics as Prometheus text metrics through the diagnostics-prometheus plugin"
title: "Prometheus metrics"
+sidebarTitle: "Prometheus"
read_when:
- You want Prometheus, Grafana, VictoriaMetrics, or another scraper to collect OpenClaw Gateway metrics
- You need the Prometheus metric names and label policy for dashboards or alerts
- You want metrics without running an OpenTelemetry collector
---
-OpenClaw can expose diagnostics metrics through the bundled
-`diagnostics-prometheus` plugin. It listens to trusted internal diagnostics and
-renders a Prometheus text endpoint at:
+OpenClaw can expose diagnostics metrics through the bundled `diagnostics-prometheus` plugin. It listens to trusted internal diagnostics and renders a Prometheus text endpoint at:
```text
-/api/diagnostics/prometheus
+GET /api/diagnostics/prometheus
```
-The route uses Gateway authentication. Do not expose it as a public
-unauthenticated `/metrics` endpoint.
+Content type is `text/plain; version=0.0.4; charset=utf-8`, the standard Prometheus exposition format.
+
+
+The route uses Gateway authentication (operator scope). Do not expose it as a public unauthenticated `/metrics` endpoint. Scrape it through the same auth path you use for other operator APIs.
+
+
+For traces, logs, OTLP push, and OpenTelemetry GenAI semantic attributes, see [OpenTelemetry export](/gateway/opentelemetry).
## Quick start
-```json5
-{
- plugins: {
- allow: ["diagnostics-prometheus"],
- entries: {
- "diagnostics-prometheus": { enabled: true },
- },
- },
- diagnostics: {
- enabled: true,
- },
-}
-```
+
+
+
+
+ ```json5
+ {
+ plugins: {
+ allow: ["diagnostics-prometheus"],
+ entries: {
+ "diagnostics-prometheus": { enabled: true },
+ },
+ },
+ diagnostics: {
+ enabled: true,
+ },
+ }
+ ```
+
+
+ ```bash
+ openclaw plugins enable diagnostics-prometheus
+ ```
+
+
+
+
+ The HTTP route is registered at plugin startup, so reload after enabling.
+
+
+ Send the same gateway auth your operator clients use:
-You can also enable the plugin from the CLI:
+ ```bash
+ curl -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
+ http://127.0.0.1:18789/api/diagnostics/prometheus
+ ```
-```bash
-openclaw plugins enable diagnostics-prometheus
-```
+
+
+ ```yaml
+ # prometheus.yml
+ scrape_configs:
+ - job_name: openclaw
+ scrape_interval: 30s
+ metrics_path: /api/diagnostics/prometheus
+ authorization:
+ credentials_file: /etc/prometheus/openclaw-gateway-token
+ static_configs:
+ - targets: ["openclaw-gateway:18789"]
+ ```
+
+
-Then scrape the protected Gateway route with the same Gateway authentication you
-use for operator APIs.
+
+`diagnostics.enabled: true` is required. Without it, the plugin still registers the HTTP route but no diagnostic events flow into the exporter, so the response is empty.
+
## Metrics exported
@@ -74,16 +111,99 @@ use for operator APIs.
## Label policy
-Prometheus labels stay bounded and low-cardinality. The exporter does not emit
-raw diagnostic identifiers such as `runId`, `sessionKey`, `sessionId`, `callId`,
-`toolCallId`, message IDs, chat IDs, or provider request IDs.
+
+
+ Prometheus labels stay bounded and low-cardinality. The exporter does not emit raw diagnostic identifiers such as `runId`, `sessionKey`, `sessionId`, `callId`, `toolCallId`, message IDs, chat IDs, or provider request IDs.
-Label values are redacted and must match OpenClaw's low-cardinality character
-policy. Values that fail the policy are replaced with `unknown`, `other`, or
-`none`, depending on the metric.
+ Label values are redacted and must match OpenClaw's low-cardinality character policy. Values that fail the policy are replaced with `unknown`, `other`, or `none`, depending on the metric.
-The exporter caps retained time series in memory. If the cap is reached, new
-series are dropped and `openclaw_prometheus_series_dropped_total` increments.
+
+
+ The exporter caps retained time series in memory at **2048** series across counters, gauges, and histograms combined. New series beyond that cap are dropped, and `openclaw_prometheus_series_dropped_total` increments by one each time.
-For full traces, logs, OTLP export, and OpenTelemetry GenAI semantic attributes,
-use [OpenTelemetry export](/gateway/opentelemetry).
+ Watch this counter as a hard signal that an attribute upstream is leaking high-cardinality values. The exporter never lifts the cap automatically; if it climbs, fix the source rather than disabling the cap.
+
+
+
+ - prompt text, response text, tool inputs, tool outputs, system prompts
+ - raw provider request IDs (only bounded hashes, where applicable, on spans — never on metrics)
+ - session keys and session IDs
+ - hostnames, file paths, secret values
+
+
+
+## PromQL recipes
+
+```promql
+# Tokens per minute, split by provider
+sum by (provider) (rate(openclaw_model_tokens_total[1m]))
+
+# Spend (USD) over the last hour, by model
+sum by (model) (increase(openclaw_model_cost_usd_total[1h]))
+
+# 95th percentile model run duration
+histogram_quantile(
+ 0.95,
+ sum by (le, provider, model)
+ (rate(openclaw_run_duration_seconds_bucket[5m]))
+)
+
+# Queue wait time SLO (95p under 2s)
+histogram_quantile(
+ 0.95,
+ sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
+) < 2
+
+# Dropped Prometheus series (cardinality alarm)
+increase(openclaw_prometheus_series_dropped_total[15m]) > 0
+```
+
+
+Prefer `gen_ai_client_token_usage` for cross-provider dashboards: it follows the OpenTelemetry GenAI semantic conventions and is consistent with metrics from non-OpenClaw GenAI services.
+
+
+## Choosing between Prometheus and OpenTelemetry export
+
+OpenClaw supports both surfaces independently. You can run either, both, or neither.
+
+
+
+ - **Pull** model: Prometheus scrapes `/api/diagnostics/prometheus`.
+ - No external collector required.
+ - Authenticated through normal Gateway auth.
+ - Surface is metrics only (no traces or logs).
+ - Best for stacks already standardized on Prometheus + Grafana.
+
+
+ - **Push** model: OpenClaw sends OTLP/HTTP to a collector or OTLP-compatible backend.
+ - Surface includes metrics, traces, and logs.
+ - Bridges to Prometheus through an OpenTelemetry Collector (`prometheus` or `prometheusremotewrite` exporter) when you need both.
+ - See [OpenTelemetry export](/gateway/opentelemetry) for the full catalog.
+
+
+
+## Troubleshooting
+
+
+
+ - Check `diagnostics.enabled: true` in config.
+ - Confirm the plugin is enabled and loaded with `openclaw plugins list --enabled`.
+ - Generate some traffic; counters and histograms only emit lines after at least one event.
+
+
+ The endpoint requires the Gateway operator scope (`auth: "gateway"` with `gatewayRuntimeScopeSurface: "trusted-operator"`). Use the same token or password Prometheus uses for any other Gateway operator route. There is no public unauthenticated mode.
+
+
+ A new attribute is exceeding the **2048**-series cap. Inspect recent metrics for an unexpectedly high-cardinality label and fix it at the source. The exporter intentionally drops new series instead of silently rewriting labels.
+
+
+ The plugin keeps state in memory only. After a Gateway restart, counters reset to zero and gauges restart at their next reported value. Use PromQL `rate()` and `increase()` to handle resets cleanly.
+
+
+
+## Related
+
+- [Diagnostics export](/gateway/diagnostics) — local diagnostics zip for support bundles
+- [Health and readiness](/gateway/health) — `/healthz` and `/readyz` probes
+- [Logging](/logging) — file-based logging
+- [OpenTelemetry export](/gateway/opentelemetry) — OTLP push for traces, metrics, and logs