Commit Graph

78 Commits

Author SHA1 Message Date
Mert Başar
029ca8c268 feat(agents): implement state-aware failover and lane suspension
Summary:
- Persist quota-suspension state transitions and reload fresh suspension state before failover handoff injection.
- Restore suspended lanes to configured concurrency and share failover-to-suspension reason mapping across fallback and embedded runner paths.
- Export model.failover diagnostics via OTLP and cover queueing/resume behavior with regressions.

Verification:
- pnpm test src/config/sessions/store.pruning.integration.test.ts src/process/command-queue.test.ts src/agents/session-suspension.test.ts src/agents/model-fallback.test.ts extensions/diagnostics-otel/src/service.test.ts
- git diff --check
- pnpm exec oxfmt --check --threads=1 on changed TypeScript files
- GitHub checks: 92 successful, 0 pending, 0 failed on head 962146be88
- Review threads: none unresolved
2026-05-07 18:34:05 -05:00
Peter Steinberger
1ef85c7d4c test: make suites safe without isolation (#78834)
* test: make suites safe without isolation

* fix: narrow auth profile credential types

* test: inject channel module loader factory locally
2026-05-07 08:43:29 +01:00
Jesse Merhi
1c42c77433 feat: add user input blocking lifecycle gates (#75035)
Summary:
- The PR adds a `before_agent_run` plugin hook with pass/block decisions, redacted blocked-turn persistence, diagnostics/docs/changelog updates, and focused runner, gateway, session, and plugin tests.
- Reproducibility: not applicable. as a feature PR rather than a current-main bug report. Current main lacks ` ... un`, while the PR head adds source coverage and copied live Gateway/WebChat log proof for the new behavior.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix: trim before agent hook PR scope
- PR branch already contained follow-up commit before automerge: fix: keep before-agent blocks redacted
- PR branch already contained follow-up commit before automerge: fix: keep runtime context out of model prompt
- PR branch already contained follow-up commit before automerge: docs: refresh config baseline after rebase
- PR branch already contained follow-up commit before automerge: fix: align blocked turn clients with redacted content
- PR branch already contained follow-up commit before automerge: fix: remove out-of-scope client block UI changes

Validation:
- ClawSweeper review passed for head 767e46fde8.
- Required merge gates passed before the squash merge.

Prepared head SHA: 767e46fde8
Review: https://github.com/openclaw/openclaw/pull/75035#issuecomment-4351843275

Co-authored-by: Jesse Merhi <jessejmerhi@gmail.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
2026-05-06 11:41:04 +00:00
Vincent Koc
e2501b2d6d fix(diagnostics): export Talk metrics after SDK refactor
Adds bounded Talk lifecycle/audio diagnostics and session recovery metrics for OTEL, Prometheus, and stability snapshots after the Talk SDK/session refactor. Includes changelog/docs updates and Testbox/live proof.
2026-05-06 02:01:52 -07:00
Peter Steinberger
b43efd3793 fix: clean up post-land CI guards 2026-05-06 02:51:53 +01:00
Peter Steinberger
ada560ece4 feat: adapt voice surfaces to talk events 2026-05-06 02:39:15 +01:00
Peter Steinberger
05eda57b3c refactor: migrate bundled plugins to message lifecycle 2026-05-06 01:46:42 +01:00
Vincent Koc
e03fe1e289 fix(telegram): reuse preview for long text finals (#77658)
* fix(telegram): reuse preview for long text finals

* test(qa): cover long telegram finals

* fix(qa): satisfy extension lint

* fix(qa): keep telegram long final fixture to two chunks

* test(telegram): cover three chunk finals

* fix(telegram): force long final preview boundary
2026-05-04 21:19:44 -07:00
Peter Steinberger
d522a18971 fix: sync Codex app-server protocol (#77578)
* fix: sync codex app-server protocol

* docs: add codex protocol changelog

* fix: refresh codex protocol schemas
2026-05-05 00:43:07 +01:00
Vincent Koc
ce8bc1a3e3 fix(lint): cover diagnostic phase events 2026-05-04 15:40:00 -07:00
Vincent Koc
50da306c0a fix(telemetry): bound message diagnostics labels 2026-05-03 19:02:58 -07:00
Peter Steinberger
32db81ca5c fix: classify session liveness diagnostics 2026-05-02 00:13:58 +01:00
Peter Steinberger
8a48994802 fix(otel): record liveness warnings 2026-04-28 05:41:30 +01:00
Peter Steinberger
5826774076 fix(diagnostics-otel): handle liveness warnings 2026-04-28 05:32:40 +01:00
Peter Steinberger
cb1bca1a16 fix(diagnostics): export liveness warning telemetry 2026-04-28 05:28:04 +01:00
EVA
1adaa28dc8 [plugin sdk] Add generic plugin host-hook contracts (#72287)
Merged via squash.

Prepared head SHA: 68e5f2ce19
Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-04-27 17:07:02 -07:00
Peter Steinberger
74e62c32c3 test: route extension tests through sdk subpaths 2026-04-27 21:58:48 +01:00
Peter Steinberger
a95da5b52d fix(models): enrich local transport failure diagnostics 2026-04-27 09:25:38 +01:00
Vincent Koc
c6e9849351 feat(diagnostics): capture model call size timing 2026-04-26 13:43:22 -07:00
Sally O'Malley
637bd33e69 fix(diagnostics): defer OTEL run span finalization (#72260) 2026-04-26 11:29:05 -07:00
Vincent Koc
82ddcf24f5 feat(diagnostics): add harness lifecycle telemetry 2026-04-25 23:34:34 -07:00
Vincent Koc
44da034516 fix(otel): add agent label to token metrics 2026-04-25 20:57:47 -07:00
Vincent Koc
2495585a32 feat(diagnostics-otel): add exporter health diagnostics
Adds diagnostics-otel exporter health events and signal-specific endpoint wiring, with docs and config schema coverage.
2026-04-25 18:34:44 -07:00
Vincent Koc
84f183b7ad test(diagnostics-otel): cover trace self-parent guard 2026-04-25 12:42:07 -07:00
Vincent Koc
5fe06f3cdc test(diagnostics-otel): cover untrusted trace parent rejection 2026-04-25 12:34:16 -07:00
Vincent Koc
44648440a5 fix(diagnostics-otel): stabilize genai token metric model attr 2026-04-25 12:22:55 -07:00
Vincent Koc
5815ca93d9 fix(diagnostics-otel): honor genai usage semconv opt-in 2026-04-25 12:13:50 -07:00
Vincent Koc
5671fdca87 feat(diagnostics-otel): add genai usage span identity 2026-04-25 12:03:10 -07:00
Vincent Koc
cd7a8f870b feat(diagnostics-otel): add genai usage span attrs 2026-04-25 11:56:13 -07:00
Vincent Koc
dc19069d71 feat(diagnostics-otel): add genai operation duration metric 2026-04-25 11:48:10 -07:00
Vincent Koc
7bbd47349e feat(diagnostics-otel): add genai token usage metric 2026-04-25 11:31:45 -07:00
Vincent Koc
b8a41739d5 feat(diagnostics-otel): export memory diagnostics 2026-04-25 11:22:19 -07:00
Vincent Koc
d6ef1fcf24 feat(diagnostics-otel): export tool loop events 2026-04-25 11:11:56 -07:00
Vincent Koc
ff172f46a5 feat(diagnostics-otel): add context assembly spans 2026-04-25 11:03:46 -07:00
Vincent Koc
44114328b4 feat(diagnostics): surface provider request id hashes 2026-04-25 10:46:10 -07:00
Vincent Koc
a7de722f4f fix(diagnostics-otel): align GenAI semconv attrs 2026-04-25 10:33:13 -07:00
Vincent Koc
dcdf97685b fix(diagnostics): trust internal trace parents (#71574)
* fix(diagnostics): trust internal trace parents

* fix(diagnostics): harden trusted trace metadata

* fix(tooling): honor explicit oxlint threads

* fix(agents): use stable nonmutating sort helpers

* chore(plugin-sdk): refresh api baseline

* fix(diagnostics): gate internal event subscriptions

* fix(diagnostics): isolate listener event copies

* chore(plugin-sdk): refresh internal diagnostics baseline

* chore(plugin-sdk): refresh diagnostics event baseline

* fix(diagnostics): keep event state module local

* fix(diagnostics): harden internal subscription capability

* fix(diagnostics): freeze listener metadata
2026-04-25 10:18:52 -07:00
Vincent Koc
bd32b1a906 feat(diagnostics): add outbound delivery lifecycle events
Add bounded outbound message delivery lifecycle diagnostics and OTEL export without message body, recipient, room, media path, or raw channel result data.
2026-04-25 01:26:34 -07:00
Vincent Koc
3e3bba4f30 feat(diagnostics): emit exec process telemetry (#71451) 2026-04-25 00:12:58 -07:00
Vincent Koc
56eb1ffabf fix(diagnostics-otel): support preloaded sdk mode (#71450) 2026-04-24 23:55:34 -07:00
Vincent Koc
718dffd2f2 fix(diagnostics): harden capture redaction and discord metadata fetch (#71303) 2026-04-24 17:51:12 -07:00
Vincent Koc
d4d4a8c14e feat(diagnostics-otel): add content capture controls
Add opt-in diagnostics OTEL content capture controls, keep raw content export default-off, and guard the content-capture tests against magic truncation bounds.
2026-04-24 16:41:28 -07:00
Vincent Koc
139dfd97bb fix(diagnostics-otel): export logs from diagnostic events
Export diagnostics OTEL logs through bounded diagnostic log events while keeping core log records off the public plugin diagnostic stream.\n\nIncludes security hardening for log payload redaction, bounded attributes, prototype-pollution keys, OTEL export failure reporting, and extension SDK seam usage.
2026-04-24 14:51:45 -07:00
Vincent Koc
f8573fe9c2 feat(diagnostics): export lifecycle OTEL spans
Export diagnostics OTEL lifecycle spans for runs, model calls, and tool executions while avoiding retained live span state and high-cardinality/sensitive exported attributes.
2026-04-24 09:36:35 -07:00
Vincent Koc
0e7250f37b feat(diagnostics): emit model call events
Emit structured diagnostic events for embedded run and model-call lifecycle with trace context, duration, and safe error categories.
2026-04-24 02:17:07 -07:00
Vincent Koc
cead2ea4b1 feat(diagnostics): emit tool execution events
Emit structured diagnostic events for tool execution lifecycle, with trace context, safe parameter summaries, and non-message error metadata.
2026-04-24 01:16:13 -07:00
Vincent Koc
4630ce3d9e fix(diagnostics): make otel service restart safe
Make diagnostics-otel startup restart-safe by tearing down stale SDK, log transport, and diagnostic-event listener handles before reinitializing or disabling the service. Adds regression coverage for repeated start and disabled restart paths.\n\nThanks @vincentkoc.
2026-04-24 00:53:04 -07:00
Vincent Koc
bcdacfa1b3 feat(diagnostics): carry trace context through hooks
Pass immutable diagnostic trace contexts through agent and tool hook surfaces, emit model usage with the run trace, and parent OTEL spans/logs from validated trace context without retained global state.\n\nThanks @vincentkoc.
2026-04-24 00:24:32 -07:00
Vincent Koc
8fade9df27 feat(diagnostics): attach trace context to otel logs (#70961)
* feat(diagnostics): attach trace context to otel logs

* fix(diagnostics): satisfy trace flags lint
2026-04-23 23:40:42 -07:00
Peter Steinberger
e3caacd530 lint: enforce exhaustive switches 2026-04-23 06:02:12 +01:00