* fix: canonicalize session keys at write time to prevent orphaned sessions (#29683)
resolveSessionKey() uses hardcoded DEFAULT_AGENT_ID="main", but all read
paths canonicalize via cfg. When the configured default agent differs
(e.g. "ops" with mainKey "work"), writes produce "agent:main:main" while
reads look up "agent:ops:work", orphaning transcripts on every restart.
Fix all three write-path call sites by wrapping with
canonicalizeMainSessionAlias:
- initSessionState (auto-reply/reply/session.ts)
- runWebHeartbeatOnce (web/auto-reply/heartbeat-runner.ts)
- resolveCronAgentSessionKey (cron/isolated-agent/session-key.ts)
Add startup migration (migrateOrphanedSessionKeys) to rename existing
orphaned keys to canonical form, merging by most-recent updatedAt.
* fix: address review — track agent IDs in migration map, align snapshot key
P1: migrateOrphanedSessionKeys now tracks agentId alongside each store
path in a Map instead of inferring from the filesystem path. This
correctly handles custom session.store templates outside the default
agents/<id>/ layout.
P2: Pass the already-canonicalized sessionKey to getSessionSnapshot so
the heartbeat snapshot reads/restores use the same key as the write path.
* fix: log migration results at all early return points
migrateOrphanedSessionKeys runs before detectLegacyStateMigrations, so
it can canonicalize legacy keys (e.g. "main" → "agent:main:main") before
the legacy detector sees them. This caused the early return path to skip
logging, breaking doctor-state-migrations tests that assert log.info was
called.
Extract logMigrationResults helper and call it at every return point.
* fix: handle shared stores and ~ expansion in migration
P1: When session.store has no {agentId}, all agents resolve to the same
file. Track all agentIds per store path (Map<path, Set<id>>) and run
canonicalization once per agent. Skip cross-agent "agent:main:*"
remapping when "main" is a legitimate configured agent sharing the store,
to avoid merging its data into another agent's namespace.
P2: Use expandHomePrefix (environment-aware ~ resolution) instead of
os.homedir() in resolveStorePathFromTemplate, matching the runtime
resolveStorePath behavior for OPENCLAW_HOME/HOME overrides.
* fix: narrow cross-agent remap to provable orphan aliases only
Only remap agent:main:* keys where the suffix is a main session alias
("main" or the configured mainKey). Other agent:main:* keys — hooks,
subagents, cron sessions, per-sender keys — may be intentional
cross-agent references and must not be silently moved into another
agent's namespace.
* fix: run orphan-key session migration at gateway startup (#29683)
* fix: canonicalize cross-agent legacy main aliases in session keys (#29683)
* fix: guard shared-store migration against cross-agent legacy alias remap (#29683)
* refactor: split session-key migration out of pr 30654
---------
Co-authored-by: Your Name <your_email@example.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(agents): preserve original task prompt on model fallback for new sessions
* fix(agents): use dynamic transcript check for sessionHasHistory on fallback retry
Address Greptile review feedback: replace the static !isNewSession flag
with a dynamic sessionFileHasContent() check that reads the on-disk
transcript before each fallback retry. This correctly handles the edge
case where the primary model completes at least one assistant-response
turn (flushing the user message to disk) before failing - the fallback
now sends the recovery prompt instead of duplicating the original body.
The !isNewSession short-circuit is kept as a fast path so existing
sessions skip the file read entirely.
* fix(agents): address security vulnerabilities in session fallback logic
Fixes three medium-severity security issues identified by Aisle Security Analysis on PR #55632:
- CWE-400: Unbounded session transcript read in sessionFileHasContent()
- CWE-400: Symlink-following in sessionFileHasContent()
- CWE-201: Sensitive prompt replay to a different fallback provider
* fix(agents): use JSONL parsing for session history detection (CWE-703)
Replace bounded byte-prefix substring matching in sessionFileHasContent()
with line-by-line JSONL record parsing. The previous approach could miss
an assistant message when the preceding user content exceeded the 256KB
read limit, causing a false negative that blocks cross-provider fallback
entirely.
* fix(agents): preserve fallback prompt across providers
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Remove memory_search and memory_get from SUBAGENT_TOOL_DENY_ALWAYS.
These are read-only tools with no side effects that are essential for
multi-agent setups relying on shared memory for context retrieval.
Rationale:
- Read-only tools (memory_search, memory_get) have no side effects and
cannot modify state, send messages, or affect external systems
- Other read-only tools (read, web_search, web_fetch) are already
available to sub-agents by default
- Multi-agent deployments with shared knowledge depend on memory tools
for context retrieval
- The workaround (tools.subagents.tools.alsoAllow) works but requires
manual configuration that contradicts memorySearch.enabled: true
Fixes#55385
* gateway: prefer transcript model in sessions list
* gateway: keep live subagent model in session rows
* gateway: prefer selected model until runtime refresh
* gateway: simplify session model identity selection
* gateway: avoid transcript model fallback on cost-only reads
loadChannelOutboundAdapter (via createChannelRegistryLoader) was reading
from getActivePluginRegistry() — the unpinned active registry that gets
replaced whenever loadOpenClawPlugins() runs (config schema reads, plugin
status queries, tool listings, etc.).
After replacement, the active registry may omit channel entries or carry
them in setup mode without outbound adapters, causing:
Outbound not configured for channel: telegram
The channel inbound path already uses the pinned registry
(getActivePluginChannelRegistry) which is frozen at gateway startup and
survives all subsequent registry replacements. This commit aligns the
outbound path to use the same pinned surface.
Adds a regression test that pins a registry with a telegram outbound
adapter, replaces the active registry with an empty one, then asserts
loadChannelOutboundAdapter still resolves the adapter.
Fixes#54745Fixes#54013
When approvals.exec.targets routes to a Telegram DM, the recipient
receives inline approval buttons but may not have explicit
channels.telegram.execApprovals configured. This adds a fallback
isTelegramExecApprovalTargetRecipient check so those DM recipients
can act on the buttons they were sent.
Includes accountId scoping for multi-bot deployments and 9 new tests.
* fix(plugins): reuse active registry for sub-agent tool resolution
* test(plugins): harden resolveRuntimePluginRegistry with per-field, caller-shape, and cold-start tests
Add 11 regression tests covering:
- R1: Per-field isolation (coreGatewayHandlers, includeSetupOnlyChannelPlugins,
preferSetupRuntimeForChannelPlugins each independently prevent fallback;
empty onlyPluginIds[] treated as non-gateway-scoped)
- R2: Caller-shape regression (tools.ts, memory-runtime.ts,
channel-resolution.ts shapes fall back; web-search-providers.runtime.ts
with onlyPluginIds does not)
- R3: Cold-start path (null active registry falls through to loadOpenClawPlugins)
Add debug logging to resolveRuntimePluginRegistry recording which exit path
was taken (no-options, cache-key-match, non-gateway-scoped fallback, fresh load).
* refactor: simplify plugin registry resolution tests and trim happy-path debug logs
* fix(plugins): address review comments on registry fallback
- Fix cold-start test assertion: loadOpenClawPlugins always activates
the registry (shouldActivate defaults to true), so getActivePluginRegistry()
is not null after the call. Updated assertion to match actual behavior.
- Add safety comment documenting why the non-gateway-scoped fallback is
safe despite cache-key mismatch: single-gateway-per-process model means
sub-agents share workspaceDir, config, and env with the gateway.
* test(plugins): restructure per-field isolation tests to avoid load timeouts
Test isGatewayScopedLoad directly instead of going through the full
resolveRuntimePluginRegistry path which triggers expensive plugin
discovery. This fixes the includeSetupOnlyChannelPlugins test timing
out in CI while providing more precise coverage of the predicate.
* fix(plugins): expand safety comment to address startup-scoped registry concern
* fix(plugins): scope subagent registry reuse to tool loading
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Fixes#46185.
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts
Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.