feat(diagnostics-prometheus): add protected metrics exporter

This commit is contained in:
Vincent Koc
2026-04-26 01:05:45 -07:00
parent 6cd047e7c2
commit 0f2e7510cb
19 changed files with 1062 additions and 7 deletions

View File

@@ -1442,6 +1442,7 @@
"gateway/doctor",
"logging",
"gateway/opentelemetry",
"gateway/prometheus",
"gateway/logging",
"gateway/diagnostics",
"gateway/troubleshooting"

View File

@@ -0,0 +1,89 @@
---
summary: "Expose OpenClaw diagnostics as Prometheus text metrics through the diagnostics-prometheus plugin"
title: "Prometheus metrics"
read_when:
- You want Prometheus, Grafana, VictoriaMetrics, or another scraper to collect OpenClaw Gateway metrics
- You need the Prometheus metric names and label policy for dashboards or alerts
- You want metrics without running an OpenTelemetry collector
---
OpenClaw can expose diagnostics metrics through the bundled
`diagnostics-prometheus` plugin. It listens to trusted internal diagnostics and
renders a Prometheus text endpoint at:
```text
/api/diagnostics/prometheus
```
The route uses Gateway authentication. Do not expose it as a public
unauthenticated `/metrics` endpoint.
## Quick start
```json5
{
plugins: {
allow: ["diagnostics-prometheus"],
entries: {
"diagnostics-prometheus": { enabled: true },
},
},
diagnostics: {
enabled: true,
},
}
```
You can also enable the plugin from the CLI:
```bash
openclaw plugins enable diagnostics-prometheus
```
Then scrape the protected Gateway route with the same Gateway authentication you
use for operator APIs.
## Metrics exported
| Metric | Type | Labels |
| --------------------------------------------- | --------- | ----------------------------------------------------------------------------------------- |
| `openclaw_run_completed_total` | counter | `channel`, `model`, `outcome`, `provider`, `trigger` |
| `openclaw_run_duration_seconds` | histogram | `channel`, `model`, `outcome`, `provider`, `trigger` |
| `openclaw_model_call_total` | counter | `api`, `error_category`, `model`, `outcome`, `provider`, `transport` |
| `openclaw_model_call_duration_seconds` | histogram | `api`, `error_category`, `model`, `outcome`, `provider`, `transport` |
| `openclaw_model_tokens_total` | counter | `agent`, `channel`, `model`, `provider`, `token_type` |
| `openclaw_gen_ai_client_token_usage` | histogram | `model`, `provider`, `token_type` |
| `openclaw_model_cost_usd_total` | counter | `agent`, `channel`, `model`, `provider` |
| `openclaw_tool_execution_total` | counter | `error_category`, `outcome`, `params_kind`, `tool` |
| `openclaw_tool_execution_duration_seconds` | histogram | `error_category`, `outcome`, `params_kind`, `tool` |
| `openclaw_harness_run_total` | counter | `channel`, `error_category`, `harness`, `model`, `outcome`, `phase`, `plugin`, `provider` |
| `openclaw_harness_run_duration_seconds` | histogram | `channel`, `error_category`, `harness`, `model`, `outcome`, `phase`, `plugin`, `provider` |
| `openclaw_message_processed_total` | counter | `channel`, `outcome`, `reason` |
| `openclaw_message_processed_duration_seconds` | histogram | `channel`, `outcome`, `reason` |
| `openclaw_message_delivery_total` | counter | `channel`, `delivery_kind`, `error_category`, `outcome` |
| `openclaw_message_delivery_duration_seconds` | histogram | `channel`, `delivery_kind`, `error_category`, `outcome` |
| `openclaw_queue_lane_size` | gauge | `lane` |
| `openclaw_queue_lane_wait_seconds` | histogram | `lane` |
| `openclaw_session_state_total` | counter | `reason`, `state` |
| `openclaw_session_queue_depth` | gauge | `state` |
| `openclaw_memory_bytes` | gauge | `kind` |
| `openclaw_memory_rss_bytes` | histogram | none |
| `openclaw_memory_pressure_total` | counter | `level`, `reason` |
| `openclaw_telemetry_exporter_total` | counter | `exporter`, `reason`, `signal`, `status` |
| `openclaw_prometheus_series_dropped_total` | counter | none |
## Label policy
Prometheus labels stay bounded and low-cardinality. The exporter does not emit
raw diagnostic identifiers such as `runId`, `sessionKey`, `sessionId`, `callId`,
`toolCallId`, message IDs, chat IDs, or provider request IDs.
Label values are redacted and must match OpenClaw's low-cardinality character
policy. Values that fail the policy are replaced with `unknown`, `other`, or
`none`, depending on the metric.
The exporter caps retained time series in memory. If the cap is reached, new
series are dropped and `openclaw_prometheus_series_dropped_total` increments.
For full traces, logs, OTLP export, and OpenTelemetry GenAI semantic attributes,
use [OpenTelemetry export](/gateway/opentelemetry).

View File

@@ -420,8 +420,9 @@ The same rule applies to other bundled-helper families such as:
`plugin-sdk/nextcloud-talk`, `plugin-sdk/nostr`, `plugin-sdk/tlon`,
`plugin-sdk/twitch`,
`plugin-sdk/github-copilot-login`, `plugin-sdk/github-copilot-token`,
`plugin-sdk/diagnostics-otel`, `plugin-sdk/diffs`, `plugin-sdk/llm-task`,
`plugin-sdk/thread-ownership`, and `plugin-sdk/voice-call`
`plugin-sdk/diagnostics-otel`, `plugin-sdk/diagnostics-prometheus`,
`plugin-sdk/diffs`, `plugin-sdk/llm-task`, `plugin-sdk/thread-ownership`,
and `plugin-sdk/voice-call`
`plugin-sdk/github-copilot-token` currently exposes the narrow token-helper
surface `DEFAULT_COPILOT_API_BASE_URL`,

View File

@@ -271,7 +271,7 @@ For the plugin authoring guide, see [Plugin SDK overview](/plugins/sdk-overview)
| Line | `plugin-sdk/line`, `plugin-sdk/line-core`, `plugin-sdk/line-runtime`, `plugin-sdk/line-surface` | Bundled LINE helper/runtime surface |
| IRC | `plugin-sdk/irc`, `plugin-sdk/irc-surface` | Bundled IRC helper surface |
| Channel-specific helpers | `plugin-sdk/googlechat`, `plugin-sdk/zalouser`, `plugin-sdk/bluebubbles`, `plugin-sdk/bluebubbles-policy`, `plugin-sdk/mattermost`, `plugin-sdk/mattermost-policy`, `plugin-sdk/feishu-conversation`, `plugin-sdk/msteams`, `plugin-sdk/nextcloud-talk`, `plugin-sdk/nostr`, `plugin-sdk/tlon`, `plugin-sdk/twitch` | Bundled channel compatibility/helper seams |
| Auth/plugin-specific helpers | `plugin-sdk/github-copilot-login`, `plugin-sdk/github-copilot-token`, `plugin-sdk/diagnostics-otel`, `plugin-sdk/diffs`, `plugin-sdk/llm-task`, `plugin-sdk/thread-ownership`, `plugin-sdk/voice-call` | Bundled feature/plugin helper seams; `plugin-sdk/github-copilot-token` currently exports `DEFAULT_COPILOT_API_BASE_URL`, `deriveCopilotApiBaseUrlFromToken`, and `resolveCopilotApiToken` |
| Auth/plugin-specific helpers | `plugin-sdk/github-copilot-login`, `plugin-sdk/github-copilot-token`, `plugin-sdk/diagnostics-otel`, `plugin-sdk/diagnostics-prometheus`, `plugin-sdk/diffs`, `plugin-sdk/llm-task`, `plugin-sdk/thread-ownership`, `plugin-sdk/voice-call` | Bundled feature/plugin helper seams; `plugin-sdk/github-copilot-token` currently exports `DEFAULT_COPILOT_API_BASE_URL`, `deriveCopilotApiBaseUrlFromToken`, and `resolveCopilotApiToken` |
</Accordion>
</AccordionGroup>