* fix(agents): preserve native Anthropic tool IDs for hybrid providers
Fixes#66892
MiniMax and other hybrid providers use api.minimaxi.com/anthropic
(modelApi: anthropic-messages), which generates and expects native
Anthropic tool_call_ids in toolu_* format. The hybrid replay policy
(buildHybridAnthropicOrOpenAIReplayPolicy) applied strict
sanitization that stripped underscores from these IDs, causing
MiniMax to reject them with error 2013.
The native Anthropic provider already preserved these IDs via
preserveNativeAnthropicToolUseIds (added in 4613f121ad). This
commit enables the same flag for the hybrid anthropic-messages
branch, so toolu_* IDs pass through unsanitized while other
synthetic IDs still get strict cleanup.
* fix(agents): repair sanitized replay tool results before send
* fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu)
* fix: preserve aborted-span tool results during replay sanitize (#67620) (thanks @stainlu)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(agents): classify Cloudflare/CDN HTML error pages as transport failures
Fixes#67517
When a provider endpoint returns an HTML error page (e.g. Cloudflare
502/503/520-524), the pattern-based message classifiers would scan
the HTML body and misinterpret embedded text like "Rate limit
exceeded" as a structured rate_limit API error. This caused
incorrect failover behavior (profile rotation instead of clean
retry/fallback) and left the TUI stuck.
Two fixes:
1. classifyFailoverSignal now short-circuits on HTML responses
before running pattern matchers, returning "timeout" (transport
failure) so retry/fallback handles them correctly.
2. classifyProviderRuntimeFailureKind now detects HTML errors at
any status (not just 403), returning "upstream_html" for
non-403 statuses with a clear user-facing message about
CDN/gateway errors.
Adds regression tests covering Cloudflare 502/503 HTML with
embedded rate-limit text, 403 HTML (still classified as auth),
and JSON rate-limit responses (still classified correctly).
* fix: preserve auth and proxy HTML classification
* fix: classify HTML provider error pages correctly (#67642) (thanks @stainlu)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(microsoft,elevenlabs): add enabledByDefault so speech providers register at runtime
* fix(tts): route generic directive tokens to the explicitly declared provider
Addresses the P2 Codex review on #62846 that flagged auto-enabling
ElevenLabs as a product regression for MiniMax users. Both providers
claim the generic `speed` token, and parseTtsDirectives walked
providers in autoSelectOrder with first-match-wins, so inputs like
`[[tts:provider=minimax speed=1.2]]` silently routed speed to
providerOverrides.elevenlabs once elevenlabs participated in every
parse pass.
The parser now pre-scans for `provider=` (honoring legacy last-wins
semantics) and routes generic tokens with the declared provider tried
first, falling back to autoSelectOrder when it doesn't handle the key.
Token order inside the directive no longer matters: `speed=1.2` before
or after `provider=minimax` both resolve to MiniMax.
Adds a regression test suite covering the exact ElevenLabs/MiniMax
speed collision plus fallback, mixed-token, last-wins, and
allowProvider-disabled cases. parseTtsDirectives had no prior test
coverage.
* fix(tts): prefer active provider for generic directives
* fix: register bundled TTS providers safely (#62846) (thanks @stainlu)
* fix: use exported TTS SDK seam (#62846) (thanks @stainlu)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(tools): expand tilde in host edit/write paths (non-workspace mode)
* test: use it.runIf for visible skip when tmpdir is not under home
* fix(tools): address Codex P2 review on tilde host edit/write
Responds to two P2 findings from chatgpt-codex-connector on #62804:
1. Tests never ran in CI. The it.runIf(tmpdirUnderHome) guard always
skipped on Linux runners where os.tmpdir() is /tmp, outside $HOME, so
the regression tests reported green without executing. Tmpdirs now use
the test-isolated HOME (process.env.HOME from test/test-env.ts) so
tests run in every environment and match what expandHomePrefix
resolves, keeping them hermetic.
2. Edit recovery path resolution was inconsistent. resolveEditPath
inlined os.homedir() for tilde expansion, bypassing OPENCLAW_HOME,
while the write/edit operations use expandHomePrefix. Under a custom
OPENCLAW_HOME, wrapEditToolWithRecovery's readback targeted a
different file than the edit actually touched, so successful edits
could be reported as failures. resolveEditPath now uses the same
expandHomePrefix helper.
* test(tools): verify tilde expansion honors OPENCLAW_HOME override
The prior tests covered tilde expansion but only under the default test
home, which matches os.homedir(). That passed whether the production code
used expandHomePrefix() or inlined os.homedir() — the behaviors only
diverge when OPENCLAW_HOME is set to a path outside $HOME.
Adds four tests that set OPENCLAW_HOME to a temp dir explicitly outside
$HOME and verify that write/mkdir/read/access tilde operations resolve
against OPENCLAW_HOME, not os.homedir(). These would fail if
pi-tools.read.ts or pi-tools.host-edit.ts reverted to os.homedir(),
directly covering the Codex P2 feedback about OPENCLAW_HOME consistency.
Uses the same env snapshot/restore pattern as test/helpers/temp-home.ts.
* Agents: resolve host tilde paths against OS home
* fix: align host tilde paths with OS home (#62804) (thanks @stainlu)
* fix: keep the changelog entry in the active block (#62804) (thanks @stainlu)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix(ollama): strip provider prefix from model ID in chat requests
buildOllamaChatRequest passed params.modelId directly to the Ollama API
without stripping the "ollama/" provider prefix. The embedding provider
already handles this (normalizeEmbeddingModel at line 100), but the chat
stream path did not. When setup writes the primary model as
"ollama/<model>" or the model ID flows through without normalization,
the Ollama API rejects it with a 404.
Closes#67435
* ollama: guard chat fetch and streamline tests
* fix: restore Ollama chat model IDs (#67457) (thanks @suboss87)
* fix: preserve Ollama default chat fallback (#67457) (thanks @suboss87)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix: strip standalone <function> tool call tags from visible text (#67093)
Models like Gemma emit tool calls as standalone <function> blocks with
nested <parameter> XML instead of wrapping them in <tool_call>. The
existing stripToolCallXmlTags only recognized tool_call, tool_result,
function_call, function_calls, and tool_calls — so bare <function> and
</function> tags leaked through to the user as raw syntax on Discord
and other channels.
Add "function" to TOOL_CALL_TAG_NAMES and extend the payload detection
for <function> tags to check XML payloads (not just JSON), matching the
same behavior already applied to <tool_call>. Other tag types keep the
more conservative JSON-only check to avoid stripping prose examples.
Made-with: Cursor
* Text: harden standalone <function> stripping
* fix: strip standalone <function> tool call tags from visible text (#67318) (thanks @joelnishanth)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Fix false-positive "missing" alerts on the Model Auth status card:
- Normalize provider ids before expectsOAuth membership check (alias mismatch)
- Apply env-backed escape hatch to auth.profiles loop (not just models.providers)
- Check actual env var resolution for SecretRef apiKeys
Co-authored-by: Omar Shahine <10343873+omarshahine@users.noreply.github.com>
* docs: add async exec duplicate completion investigation
Add an internal refactor note tracing the node exec completion to system event to heartbeat to transcript path for duplicate async exec injections. Document the most likely gateway-side gap as missing idempotency for replayed exec.finished events, and note why plain outbound delivery retry is a weaker fit for duplicate user turns.
Regeneration-Prompt: |
Investigate a live duplicate async exec completion that appeared as two identical user turns in an OpenClaw session. Trace the completion path from exec producers into enqueueSystemEvent, heartbeat wake scheduling, prompt assembly, and embedded transcript persistence. Decide whether duplicate wake handling, outbound delivery retry, or duplicate completion event ingestion is the more likely cause, cite the exact code locations, and capture the smallest plausible fix seam without making runtime changes.
* fix: dedupe replayed exec finished node events
Add a narrow idempotency guard in the gateway node-event handler for repeated exec.finished events with the same canonical session key and runId. This blocks replayed async exec completions from being enqueued and heartbeated twice into the parent session. Also only request a heartbeat when the system event was actually queued, and add a regression test for duplicate runId injection.
Regeneration-Prompt: |
Prevent duplicate async exec completion events from being injected twice into the parent session. Keep the scope tight around the highest-confidence path: node exec.finished events entering gateway server-node-events and becoming system-event-driven heartbeat prompts. Add a small idempotency guard keyed by canonical session plus exec runId, avoid broader delivery or retry changes unless needed, and add regression coverage that fails if the same exec.finished replay is enqueued and woken twice.
* fix: note exec finished replay dedupe