* fix: canonicalize session keys at write time to prevent orphaned sessions (#29683)
resolveSessionKey() uses hardcoded DEFAULT_AGENT_ID="main", but all read
paths canonicalize via cfg. When the configured default agent differs
(e.g. "ops" with mainKey "work"), writes produce "agent:main:main" while
reads look up "agent:ops:work", orphaning transcripts on every restart.
Fix all three write-path call sites by wrapping with
canonicalizeMainSessionAlias:
- initSessionState (auto-reply/reply/session.ts)
- runWebHeartbeatOnce (web/auto-reply/heartbeat-runner.ts)
- resolveCronAgentSessionKey (cron/isolated-agent/session-key.ts)
Add startup migration (migrateOrphanedSessionKeys) to rename existing
orphaned keys to canonical form, merging by most-recent updatedAt.
* fix: address review — track agent IDs in migration map, align snapshot key
P1: migrateOrphanedSessionKeys now tracks agentId alongside each store
path in a Map instead of inferring from the filesystem path. This
correctly handles custom session.store templates outside the default
agents/<id>/ layout.
P2: Pass the already-canonicalized sessionKey to getSessionSnapshot so
the heartbeat snapshot reads/restores use the same key as the write path.
* fix: log migration results at all early return points
migrateOrphanedSessionKeys runs before detectLegacyStateMigrations, so
it can canonicalize legacy keys (e.g. "main" → "agent:main:main") before
the legacy detector sees them. This caused the early return path to skip
logging, breaking doctor-state-migrations tests that assert log.info was
called.
Extract logMigrationResults helper and call it at every return point.
* fix: handle shared stores and ~ expansion in migration
P1: When session.store has no {agentId}, all agents resolve to the same
file. Track all agentIds per store path (Map<path, Set<id>>) and run
canonicalization once per agent. Skip cross-agent "agent:main:*"
remapping when "main" is a legitimate configured agent sharing the store,
to avoid merging its data into another agent's namespace.
P2: Use expandHomePrefix (environment-aware ~ resolution) instead of
os.homedir() in resolveStorePathFromTemplate, matching the runtime
resolveStorePath behavior for OPENCLAW_HOME/HOME overrides.
* fix: narrow cross-agent remap to provable orphan aliases only
Only remap agent:main:* keys where the suffix is a main session alias
("main" or the configured mainKey). Other agent:main:* keys — hooks,
subagents, cron sessions, per-sender keys — may be intentional
cross-agent references and must not be silently moved into another
agent's namespace.
* fix: run orphan-key session migration at gateway startup (#29683)
* fix: canonicalize cross-agent legacy main aliases in session keys (#29683)
* fix: guard shared-store migration against cross-agent legacy alias remap (#29683)
* refactor: split session-key migration out of pr 30654
---------
Co-authored-by: Your Name <your_email@example.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(agents): preserve original task prompt on model fallback for new sessions
* fix(agents): use dynamic transcript check for sessionHasHistory on fallback retry
Address Greptile review feedback: replace the static !isNewSession flag
with a dynamic sessionFileHasContent() check that reads the on-disk
transcript before each fallback retry. This correctly handles the edge
case where the primary model completes at least one assistant-response
turn (flushing the user message to disk) before failing - the fallback
now sends the recovery prompt instead of duplicating the original body.
The !isNewSession short-circuit is kept as a fast path so existing
sessions skip the file read entirely.
* fix(agents): address security vulnerabilities in session fallback logic
Fixes three medium-severity security issues identified by Aisle Security Analysis on PR #55632:
- CWE-400: Unbounded session transcript read in sessionFileHasContent()
- CWE-400: Symlink-following in sessionFileHasContent()
- CWE-201: Sensitive prompt replay to a different fallback provider
* fix(agents): use JSONL parsing for session history detection (CWE-703)
Replace bounded byte-prefix substring matching in sessionFileHasContent()
with line-by-line JSONL record parsing. The previous approach could miss
an assistant message when the preceding user content exceeded the 256KB
read limit, causing a false negative that blocks cross-provider fallback
entirely.
* fix(agents): preserve fallback prompt across providers
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(acpx): read ACPX_PINNED_VERSION from package.json instead of hardcoding
The hardcoded ACPX_PINNED_VERSION ("0.1.16") falls out of sync with the bundled acpx version in package.json every release, causing ACP runtime to be marked unavailable due to version mismatch (see #43997).
* Validate and sanitize ACPX version retrieval
Add validation for acpx version from package.json
Remove memory_search and memory_get from SUBAGENT_TOOL_DENY_ALWAYS.
These are read-only tools with no side effects that are essential for
multi-agent setups relying on shared memory for context retrieval.
Rationale:
- Read-only tools (memory_search, memory_get) have no side effects and
cannot modify state, send messages, or affect external systems
- Other read-only tools (read, web_search, web_fetch) are already
available to sub-agents by default
- Multi-agent deployments with shared knowledge depend on memory tools
for context retrieval
- The workaround (tools.subagents.tools.alsoAllow) works but requires
manual configuration that contradicts memorySearch.enabled: true
Fixes#55385
* gateway: prefer transcript model in sessions list
* gateway: keep live subagent model in session rows
* gateway: prefer selected model until runtime refresh
* gateway: simplify session model identity selection
* gateway: avoid transcript model fallback on cost-only reads