Commit Graph

19 Commits

Author SHA1 Message Date
Peter Steinberger
fb6e415d0c fix(agents): align session lock hold budget with run timeouts 2026-02-17 03:10:36 +01:00
Peter Steinberger
7147cd9cc0 refactor: dedupe process-scoped lock maps 2026-02-17 00:45:02 +00:00
Vishal Doshi
e91a5b0216 fix: release stale session locks and add watchdog for hung API calls (#18060)
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.

This commit adds four layers of defense:

1. **In-process lock watchdog** (session-write-lock.ts)
   - Track acquiredAt timestamp on each held lock
   - 60-second interval timer checks all held locks
   - Auto-releases any lock held longer than maxHoldMs (default 5 min)
   - Catches the hung-API-call case that try/finally cannot

2. **Gateway startup cleanup** (server-startup.ts)
   - On boot, scan all agent session directories for *.jsonl.lock files
   - Remove locks with dead PIDs or older than staleMs (30 min)
   - Log each cleaned lock for diagnostics

3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
   - New health check scans for .jsonl.lock files
   - Reports PID status and age of each lock found
   - In --fix mode, removes stale locks automatically

4. **Transcript error entry on API failure** (attempt.ts)
   - When promptError is set, write an error marker to the session
     transcript before releasing the lock
   - Preserves conversation history even on model API failures

Closes #18060
2026-02-16 23:59:22 +01:00
Peter Steinberger
5248b759fe refactor(shared): reuse isPidAlive 2026-02-15 19:06:54 +00:00
Peter Steinberger
f4782e1e73 refactor(agents): dedupe session write lock release 2026-02-15 16:30:01 +00:00
Peter Steinberger
fdfc34fa1f perf(test): stabilize e2e harness and reduce flaky gateway coverage 2026-02-13 17:32:14 +00:00
Peter Steinberger
1eccfa8934 perf(test): trim duplicate e2e suites and harden signal hooks 2026-02-13 16:46:43 +00:00
Coy Geek
717129f7f9 fix: silence unused hook token url param (#9436)
* fix: Gateway authentication token exposed in URL query parameters

* fix: silence unused hook token url param

* fix: remove gateway auth tokens from URLs (#9436) (thanks @coygeek)

* test: fix Windows path separators in audit test (#9436)

---------

Co-authored-by: George Pickett <gpickett00@gmail.com>
2026-02-05 18:08:29 -08:00
cpojer
f16e32b73d fix: Do not process.exit(0) in the middle of a test. 2026-02-06 09:57:51 +09:00
Sash Zats
ec0728b357 fix: release session locks on process termination (#1962)
Adds cleanup handlers to release held file locks when the process
terminates via SIGTERM, SIGINT, or normal exit. This prevents orphaned
lock files that would block future sessions.

Fixes #1951
2026-02-05 16:35:34 -08:00
cpojer
5ceff756e1 chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
Josh Palmer
c41ea252b0 fix flaky web-fetch tests + lock cleanup
What:
- stub resolvePinnedHostname in web-fetch tests to avoid DNS flake
- close lock file handles via FileHandle.close during cleanup to avoid EBADF

Why:
- make CI deterministic without network/DNS dependence
- prevent double-close errors from GC

Tests:
- pnpm vitest run --config vitest.unit.config.ts src/agents/tools/web-tools.fetch.test.ts src/agents/session-write-lock.test.ts (failed: missing @aws-sdk/client-bedrock)
2026-01-29 11:05:11 +01:00
Gustavo Madeira Santana
66a5b324a1 fix: harden session lock cleanup (#2483) (thanks @janeexai) 2026-01-26 21:16:05 -05:00
Shadow
d8e5dd91ba fix: clean up session locks on exit (#2483) (thanks @janeexai) 2026-01-26 19:48:46 -06:00
Jane
14f8acdecb fix(agents): release session locks on process termination
Adds process exit handlers to release all held session locks on:
- Normal process.exit() calls
- SIGTERM / SIGINT signals

This ensures locks are cleaned up even when the process terminates
unexpectedly, preventing the 'session file locked' error.
2026-01-26 19:46:04 -06:00
Peter Steinberger
fdc50a0feb fix: normalize session lock path 2026-01-23 18:34:33 +00:00
Peter Steinberger
cf8b3ed988 fix: harden memory indexing and embedded session locks 2026-01-18 05:41:45 +00:00
Peter Steinberger
c379191f80 chore: migrate to oxlint and oxfmt
Co-authored-by: Christoph Nakazawa <christoph.pojer@gmail.com>
2026-01-14 15:02:19 +00:00
Peter Steinberger
6668805aca fix(agents): enforce single-writer session files 2026-01-11 02:25:45 +00:00