Commit Graph

16 Commits

Author SHA1 Message Date
Peter Steinberger
c397a02c9a fix(queue): harden drain/abort/timeout race handling
- reject new lane enqueues once gateway drain begins
- always reset lane draining state and isolate onWait callback failures
- persist per-session abort cutoff and skip stale queued messages
- avoid false 600s agentTurn timeout in isolated cron jobs

Fixes #27407
Fixes #27332
Fixes #27427

Co-authored-by: Kevin Shenghui <shenghuikevin@github.com>
Co-authored-by: zjmy <zhangjunmengyang@gmail.com>
Co-authored-by: suko <miha.sukic@gmail.com>
2026-02-26 13:43:39 +01:00
Peter Steinberger
185fba1d22 refactor(agents): dedupe plugin hooks and test helpers 2026-02-22 07:44:57 +00:00
Peter Steinberger
a011361784 perf(test): remove timer callbacks in command queue tests 2026-02-18 21:53:57 +00:00
Peter Steinberger
8f079afb38 perf(test): remove timer usage in command queue ordering test 2026-02-18 17:46:39 +00:00
Peter Steinberger
c7bc94436b perf(test): fake queue timers and merge telegram reply-mode checks 2026-02-18 16:01:20 +00:00
Peter Steinberger
46278e22cf perf(test): trim telegram duplicates and queue wait delays 2026-02-18 04:22:59 +00:00
cpojer
03e6acd051 chore: Fix types in tests 28/N. 2026-02-17 14:32:18 +09:00
Peter Steinberger
5caf829d28 perf(test): trim duplicate gateway and auto-reply test overhead 2026-02-13 23:40:38 +00:00
Joseph Krug
4e9f933e88 fix: reset stale execution state after SIGUSR1 in-process restart (#15195)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 676f9ec451
Co-authored-by: joeykrug <5925937+joeykrug@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-13 15:30:09 -05:00
Yi LIU
a5ccfa57a8 refactor(process): use dedicated CommandLaneClearedError in clearCommandLane
Replace bare `new Error("Command lane cleared")` with a dedicated
`CommandLaneClearedError` class so callers that fire-and-forget
enqueued tasks can catch this specific type and avoid surfacing
unhandled rejection warnings.
2026-02-13 19:43:20 +01:00
Yi LIU
a49dd83b14 fix(process): reject pending promises when clearing command lane
clearCommandLane() was truncating the queue array without calling
resolve/reject on pending entries, causing never-settling promises
and memory leaks when upstream callers await enqueueCommandInLane().

Splice entries and reject each before clearing so callers can handle
the cancellation gracefully.
2026-02-13 19:43:20 +01:00
0xRain
acb9cbb898 fix(gateway): drain active turns before restart to prevent message loss (#13931)
* fix(gateway): drain active turns before restart to prevent message loss

On SIGUSR1 restart, the gateway now waits up to 30s for in-flight agent
turns to complete before tearing down the server. This prevents buffered
messages from being dropped when config.patch or update triggers a restart
while agents are mid-turn.

Changes:
- command-queue.ts: add getActiveTaskCount() and waitForActiveTasks()
  helpers to track and wait on active lane tasks
- run-loop.ts: on restart signal, drain active tasks before server.close()
  with a 30s timeout; extend force-exit timer accordingly
- command-queue.test.ts: update imports for new exports

Fixes #13883

* fix(queue): snapshot active tasks for restart drain

---------

Co-authored-by: Elonito <0xRaini@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-12 07:55:19 -06:00
Peter Steinberger
6734f2d71c fix: wire OTLP logs for diagnostics 2026-01-20 22:51:47 +00:00
Peter Steinberger
e5f677803f chore: format to 2-space and bump changelog 2025-11-26 00:53:53 +01:00
Peter Steinberger
38659f5d3e test: sync updated specs 2025-11-25 12:12:29 +01:00
Peter Steinberger
13be898c07 feat: serialize command auto-replies with queue 2025-11-25 04:40:49 +01:00