Commit Graph

17664 Commits

Author SHA1 Message Date
Taylor Asplund
874ff7089c fix: ensure CLI exits after command completion (#12906)
* fix: ensure CLI exits after command completion

The CLI process would hang indefinitely after commands like
`openclaw gateway restart` completed successfully.  Two root causes:

1. `runCli()` returned without calling `process.exit()` after
   `program.parseAsync()` resolved, and Commander.js does not
   force-exit the process.

2. `daemon-cli/register.ts` eagerly called `createDefaultDeps()`
   which imported all messaging-provider modules, creating persistent
   event-loop handles that prevented natural Node exit.

Changes:
- Add `flushAndExit()` helper that drains stdout/stderr before calling
  `process.exit()`, preventing truncated piped output in CI/scripts.
- Call `flushAndExit()` after both `tryRouteCli()` and
  `program.parseAsync()` resolve.
- Remove unnecessary `void createDefaultDeps()` from daemon-cli
  registration — daemon lifecycle commands never use messaging deps.
- Make `serveAcpGateway()` return a promise that resolves on
  intentional shutdown (SIGINT/SIGTERM), so `openclaw acp` blocks
  `parseAsync` for the bridge lifetime and exits cleanly on signal.
- Handle the returned promise in the standalone main-module entry
  point to avoid unhandled rejections.

Fixes #12904

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: refactor CLI lifecycle and lazy outbound deps (#12906) (thanks @DrCrinkle)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-14 00:34:33 +01:00
Peter Steinberger
2378d770d1 perf(test): speed gateway suite resets with unique config roots 2026-02-13 23:33:08 +00:00
Peter Steinberger
e794ef0478 perf(test): reduce hot-suite setup and duplicate test work 2026-02-13 23:30:41 +00:00
Bridgerz
ab4a08a82a fix: defer gateway restart until all replies are sent (#12970)
* fix: defer gateway restart until all replies are sent

Fixes a race condition where gateway config changes (e.g., enabling
plugins via iMessage) trigger an immediate SIGUSR1 restart, killing the
iMessage RPC connection before replies are delivered.

Both restart paths (config watcher and RPC-triggered) now defer until
all queued operations, pending replies, and embedded agent runs complete
(polling every 500ms, 30s timeout). A shared emitGatewayRestart() guard
prevents double SIGUSR1 when both paths fire simultaneously.

Key changes:
- Dispatcher registry tracks active reply dispatchers globally
- markComplete() called in finally block for guaranteed cleanup
- Pre-restart deferral hook registered at gateway startup
- Centralized extractDeliveryInfo() for session key parsing
- Post-restart sentinel messages delivered directly (not via agent)
- config-patch distinguished from config-apply in sentinel kind

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: single-source gateway restart authorization

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-14 00:29:29 +01:00
Peter Steinberger
dc507f3dec perf(test): reduce memory and port probe overhead 2026-02-13 23:22:30 +00:00
Peter Steinberger
1aa746f042 perf(test): lower synthetic payload in embedding batch split case 2026-02-13 23:16:42 +00:00
Peter Steinberger
faeac955b5 perf(test): trim retry-loop work in embedding batch tests 2026-02-13 23:16:42 +00:00
Peter Steinberger
e324cb5b94 perf(test): reduce fixture churn in hot suites 2026-02-13 23:16:41 +00:00
Peter Steinberger
dac8f5ba3f perf(test): trim fixture and import overhead in hot suites 2026-02-13 23:16:41 +00:00
Peter Steinberger
b8703546e9 docs(changelog): note cron delivered-relay regression coverage (#15737) (thanks @brandonwise) 2026-02-14 00:08:56 +01:00
Brandon Wise
b0728e605d fix(cron): skip relay only for explicit delivery config, not legacy payload
Fixes #15692

The previous fix was too broad — it removed the relay for ALL isolated jobs.
This broke backwards compatibility for jobs without explicit delivery config.

The correct behavior is:
- If job.delivery exists → isolated runner handles it via runSubagentAnnounceFlow
- If only legacy payload.deliver fields → relay to main if requested (original behavior)

This addresses Greptile's review feedback about runIsolatedAgentJob being an
injected dependency that might not call runSubagentAnnounceFlow.

Uses resolveCronDeliveryPlan().source to distinguish between explicit delivery
config and legacy payload-only jobs.
2026-02-14 00:08:56 +01:00
Peter Steinberger
45a2cd55cc fix: harden isolated cron announce delivery fallback (#15739) (thanks @widingmarcus-cyber) 2026-02-13 23:49:10 +01:00
Marcus Widing
ea95e88dd6 fix(cron): prevent duplicate delivery for isolated jobs with announce mode
When an isolated cron job delivers its output via deliverOutboundPayloads
or the subagent announce flow, the finish handler in executeJobCore
unconditionally posts a summary to the main agent session and wakes it
via requestHeartbeatNow. The main agent then generates a second response
that is also delivered to the target channel, resulting in duplicate
messages with different content.

Add a `delivered` flag to RunCronAgentTurnResult that is set to true
when the isolated run successfully delivers its output. In executeJobCore,
skip the enqueueSystemEvent + requestHeartbeatNow call when the flag is
set, preventing the main agent from waking up and double-posting.

Fixes #15692
2026-02-13 23:49:10 +01:00
nabbilkhan
207e2c5aff fix: add outbound delivery crash recovery (#15636) (thanks @nabbilkhan) (#15636)
Co-authored-by: Shadow <hi@shadowing.dev>
2026-02-13 15:54:07 -06:00
Peter Steinberger
caebe70e9a perf(test): cut setup/import overhead in hot suites 2026-02-13 21:23:50 +00:00
Peter Steinberger
93dd51bce0 perf(matrix): lazy-load music-metadata parsing 2026-02-13 21:23:50 +00:00
Joseph Krug
4e9f933e88 fix: reset stale execution state after SIGUSR1 in-process restart (#15195)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 676f9ec451
Co-authored-by: joeykrug <5925937+joeykrug@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-13 15:30:09 -05:00
Peter Steinberger
2086cdfb9b perf(test): reduce hot-suite import and setup overhead 2026-02-13 20:26:39 +00:00
Peter Steinberger
1655df7ac0 fix(config): log config overwrite audits 2026-02-13 20:12:41 +00:00
Gustavo Madeira Santana
42eaee8b7e chore: fix root_dir resolution/stale scripts during PR review 2026-02-13 15:09:39 -05:00
Peter Steinberger
6442512954 perf: reduce hotspot test startup and timeout costs 2026-02-13 20:03:01 +00:00
Marcus Castro
31537c669a fix: archive old transcript files on /new and /reset (#14949)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 4724df7dea
Co-authored-by: mcaxtr <7562095+mcaxtr@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-13 14:55:16 -05:00
Peter Steinberger
c8b198ab51 perf: speed up gateway missing-tick e2e watchdog 2026-02-13 19:52:45 +00:00
Peter Steinberger
e746a67cc3 perf: speed up telegram media e2e flush timing 2026-02-13 19:52:45 +00:00
Gustavo Madeira Santana
bbca3b191a changelog: add missing attribution 2026-02-13 14:47:51 -05:00
Shadow
8c1e8bb2ff fix: note clawdock zsh compatibility (#15501) (thanks @nkelner) 2026-02-13 13:47:16 -06:00
Nathaniel Kelner
66f6d71ffa Update clawdock-helpers.sh compatibility with Zsh
Unlike Bash, Zsh has several "special" readonly variables (status, pipestatus, etc.) that the shell manages automatically. Shadowing them with local declarations triggers an error.
2026-02-13 13:47:16 -06:00
大猫子
f24d70ec8e fix(providers): switch MiniMax API-key provider to anthropic-messages (#15297)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0e7f84a2a1
Co-authored-by: lailoo <20536249+lailoo@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-13 14:44:36 -05:00
Marcus Castro
4225206f0c fix(gateway): normalize session key casing to prevent ghost sessions (#12846)
* fix(gateway): normalize session key casing to prevent ghost sessions on Linux

On case-sensitive filesystems (Linux), mixed-case session keys like
agent:ops:MySession and agent:ops:mysession resolve to different store
entries, creating ghost duplicates that never converge.

Core changes in session-utils.ts:
- resolveSessionStoreKey: lowercase all session key components
- canonicalizeSpawnedByForAgent: accept cfg, resolve main-alias references
  via canonicalizeMainSessionAlias after lowercasing
- loadSessionEntry: return legacyKey only when it differs from canonicalKey
- resolveGatewaySessionStoreTarget: scan store for case-insensitive matches;
  add optional scanLegacyKeys param to skip disk reads for read-only callers
- Export findStoreKeysIgnoreCase for use by write-path consumers
- Compare global/unknown sentinels case-insensitively in all canonicalization
  functions

sessions-resolve.ts:
- Make resolveSessionKeyFromResolveParams async for inline migration
- Check canonical key first (fast path), then fall back to legacy scan
- Delete ALL legacy case-variant keys in a single updateSessionStore pass

Fixes #12603

* fix(gateway): propagate canonical keys and clean up all case variants on write paths

- agent.ts: use canonicalizeSpawnedByForAgent (with cfg) instead of raw
  toLowerCase; use findStoreKeysIgnoreCase to delete all legacy variants
  on store write; pass canonicalKey to addChatRun, registerAgentRunContext,
  resolveSendPolicy, and agentCommand
- sessions.ts: replace single-key migration with full case-variant cleanup
  via findStoreKeysIgnoreCase in patch/reset/delete/compact handlers; add
  case-insensitive fallback in preview (store already loaded); make
  sessions.resolve handler async; pass scanLegacyKeys: false in preview
- server-node-events.ts: use findStoreKeysIgnoreCase to clean all legacy
  variants on voice.transcript and agent.request write paths; pass
  canonicalKey to addChatRun and agentCommand

* test(gateway): add session key case-normalization tests

Cover the case-insensitive session key canonicalization logic:
- resolveSessionStoreKey normalizes mixed-case bare and prefixed keys
- resolveSessionStoreKey resolves mixed-case main aliases (MAIN, Main)
- resolveGatewaySessionStoreTarget includes legacy mixed-case store keys
- resolveGatewaySessionStoreTarget collects all case-variant duplicates
- resolveGatewaySessionStoreTarget finds legacy main alias keys with
  customized mainKey configuration

All 5 tests fail before the production changes, pass after.

* fix: clean legacy session alias cleanup gaps (openclaw#12846) thanks @mcaxtr

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-13 20:42:24 +01:00
Shadow
f6232bc2b4 CI: close invalid items without response 2026-02-13 13:41:13 -06:00
Peter Steinberger
2b685b08c2 fix: harden matrix multi-account routing (#7286) (thanks @emonty) 2026-02-13 20:39:58 +01:00
Monty Taylor
a76ac1344e fix: resolveAllowFrom uses cfg+accountId params, not account 2026-02-13 20:39:58 +01:00
Monty Taylor
1a17466a60 fix: use account-aware config paths in resolveDmPolicy and resolveAllowFrom 2026-02-13 20:39:58 +01:00
Monty Taylor
3985ef7b37 fix: merge top-level config into per-account config so inherited settings apply 2026-02-13 20:39:58 +01:00
Monty Taylor
ed5a8dff8a chore: fix CHANGELOG.md formatting 2026-02-13 20:39:58 +01:00
Monty Taylor
da00f6cf8e fix: deep-merge nested config, prefer default account in send fallback, simplify credential filenames 2026-02-13 20:39:58 +01:00
Monty Taylor
1a72902991 refactor: read accounts from cfg.channels.matrix.accounts directly for clarity 2026-02-13 20:39:58 +01:00
Monty Taylor
bf4e348440 fix: de-duplicate normalized account IDs and add case-insensitive config lookup to send/client 2026-02-13 20:39:58 +01:00
Monty Taylor
a6dd50fede fix: normalize account config keys for case-insensitive matching 2026-02-13 20:39:58 +01:00
Monty Taylor
c89b8d99fc fix: normalize accountId in active-client and send/client for consistent keying 2026-02-13 20:39:58 +01:00
Monty Taylor
caf5d2dd7c feat(matrix): Add multi-account support to Matrix channel
The Matrix channel previously hardcoded `listMatrixAccountIds` to always
return only `DEFAULT_ACCOUNT_ID`, ignoring any accounts configured in
`channels.matrix.accounts`. This prevented running multiple Matrix bot
accounts simultaneously.

Changes:
- Update `listMatrixAccountIds` to read from `channels.matrix.accounts`
  config, falling back to `DEFAULT_ACCOUNT_ID` for legacy single-account
  configurations
- Add `resolveMatrixConfigForAccount` to resolve config for a specific
  account ID, merging account-specific values with top-level defaults
- Update `resolveMatrixAccount` to use account-specific config when
  available
- The multi-account config structure (channels.matrix.accounts) was not
  defined in the MatrixConfig type, causing TypeScript to not recognize
  the field. Added the accounts field to properly type the multi-account
  configuration.
- Add stopSharedClientForAccount() to stop only the specific account's
  client instead of all clients when an account shuts down
- Wrap dynamic import in try/finally to prevent startup mutex deadlock
  if the import fails
- Pass accountId to resolveSharedMatrixClient(), resolveMatrixAuth(),
  and createMatrixClient() to ensure the correct account's credentials
  are used for outbound messages
- Add accountId parameter to resolveMediaMaxBytes to check account-specific
  config before falling back to top-level config
- Maintain backward compatibility with existing single-account setups

This follows the same pattern already used by the WhatsApp channel for
multi-account support.

Fixes #3165
Fixes #3085

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 20:39:58 +01:00
Shadow
607b625aab Docs: update PR commit guidance 2026-02-13 13:39:35 -06:00
Peter Steinberger
e0c04c62c9 docs(signal): improve setup, verification, and troubleshooting guidance 2026-02-13 20:38:56 +01:00
Peter Steinberger
f02247b6c5 fix(ci): fix discord proxy websocket binding and bluebubbles timeout status 2026-02-13 19:35:55 +00:00
rodbland2021
d3b2135f86 fix(agents): wait for agent idle before flushing pending tool results (#13746)
* fix(agents): wait for agent idle before flushing pending tool results

When pi-agent-core's auto-retry mechanism handles overloaded/rate-limit
errors, it resolves waitForRetry() on assistant message receipt — before
tool execution completes in the retried agent loop. This causes the
attempt's finally block to call flushPendingToolResults() while tools
are still executing, inserting synthetic 'missing tool result' errors
and causing silent agent failures.

The fix adds a waitForIdle() call before the flush to ensure the agent's
retry loop (including tool execution) has fully completed.

Evidence from real session: tool call and synthetic error were only 53ms
apart — the tool never had a chance to execute before being flushed.

Root cause is in pi-agent-core's _resolveRetry() firing on message_end
instead of agent_end, but this workaround in OpenClaw prevents the
symptom without requiring an upstream fix.

Fixes #8643
Fixes #13351
Refs #6682, #12595

* test: add tests for tool result flush race condition

Validates that:
- Real tool results are not replaced by synthetic errors when they arrive in time
- Flush correctly inserts synthetic errors for genuinely orphaned tool calls
- Flush is a no-op after real tool results have already been received

Refs #8643, #13748

* fix(agents): add waitForIdle to all flushPendingToolResults call sites

The original fix only covered the main run finally block, but there are
two additional call sites that can trigger flushPendingToolResults while
tools are still executing:

1. The catch block in attempt.ts (session setup error handler)
2. The finally block in compact.ts (compaction teardown)

Both now await agent.waitForIdle() with a 30s timeout before flushing,
matching the pattern already applied to the main finally block.

Production testing on VPS with debug logging confirmed these additional
paths can fire during sub-agent runs, producing spurious synthetic
'missing tool result' errors.

* fix(agents): centralize idle-wait flush and clear timeout handle

---------

Co-authored-by: Renue Development <dev@renuebyscience.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-13 20:35:43 +01:00
Shadow
4b3c87b82d fix: finalize discord presence config (#10855) (thanks @h0tp-ftw) 2026-02-13 13:34:19 -06:00
Shadow
c82cd9e5d1 Docs: add discord presence config notes (#10855) 2026-02-13 13:34:19 -06:00
Shadow
6acea69b20 Discord: refine presence config defaults (#10855) (thanks @h0tp-ftw) 2026-02-13 13:34:19 -06:00
h0tp
770e904c21 fix(discord): restrict activity types and statuses to valid enum values
- Removed 'offline' from valid config statuses (use 'invisible').
- Restricted activityType to 0, 1, 2, 3, 5 (excluding custom/4).
- Added logic to only send 'url' when activityType is 1 (Streaming).
- Updated Typescript definitions and Zod schemas to match.
2026-02-13 13:34:19 -06:00
h0tp
5d8c6ef91c feat(discord): add configurable presence (activity/status/type)
- Adds `activity`, `status`, `activityType`, and `activityUrl` to Discord provider config schema.
- Implements a `ReadyListener` in `DiscordProvider` to apply these settings on connection.
- Solves the issue where `@buape/carbon` ignores initial presence options in constructor.
- Validated manually and via existing test suite.
2026-02-13 13:34:19 -06:00