Commit Graph

5715 Commits

Author SHA1 Message Date
Gustavo Madeira Santana
4dfecfbca8 Matrix: preserve subagent thread delivery 2026-04-17 02:15:54 -04:00
Peter Steinberger
92859357bb test: isolate agent auth and spawn hotspots 2026-04-17 07:15:27 +01:00
Peter Steinberger
dd9d2ebd01 test: stabilize MCP startup disposal race 2026-04-17 07:15:27 +01:00
Peter Steinberger
a0d9598425 test: narrow ollama provider discovery setup 2026-04-17 07:15:27 +01:00
Peter Steinberger
daaebb8558 test: split browser snapshot target helper 2026-04-17 07:15:27 +01:00
Peter Steinberger
f57ce21d73 test: trim process-backed agent assertions 2026-04-17 07:15:27 +01:00
Peter Steinberger
b71c91022b test: collapse exec preflight parser cases 2026-04-17 07:15:27 +01:00
Peter Steinberger
d07c921ae3 test: trim provider extra-param imports 2026-04-17 07:15:27 +01:00
Peter Steinberger
132d3c76a0 test: reuse browser server harness imports 2026-04-17 07:15:27 +01:00
Peter Steinberger
35dcd06764 test: trim agent test hotspots 2026-04-17 07:15:27 +01:00
Gustavo Madeira Santana
7ae670e501 Tests: fast-path Slack message tool discovery 2026-04-17 02:00:26 -04:00
Gustavo Madeira Santana
878f2122e5 Tests: fast-path Matrix ACP thread binding 2026-04-17 02:00:26 -04:00
Gustavo Madeira Santana
178c36532d Tests: isolate perf-sensitive env state 2026-04-17 02:00:26 -04:00
Rubén Cuevas
7e18c07e41 fix: dedupe repeated bootstrap truncation warnings (#67906) (thanks @rubencu)
* Agents: dedupe bootstrap truncation warnings

* Agents: normalize bootstrap warning cache bookkeeping

* fix(agents): scope bootstrap warning dedupe by workspace

* refactor(agents): simplify bootstrap warning wrapper

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-17 10:54:15 +05:30
Ayaan Zaidi
65645ec54f fix(agents): refresh bundle command discovery 2026-04-17 09:59:03 +05:30
chaoliang yan
35fb3f7e1c fix: preserve models.json baseUrls on regen (#67893) (thanks @lawrence3699)
* models-config: preserve existing models.json baseUrls

* test: distill models-config baseUrl regression test

* fix: preserve models.json baseUrls on regen (#67893) (thanks @lawrence3699)

---------

Co-authored-by: lawrence3699 <lawrence3699@users.noreply.github.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-17 08:32:35 +05:30
Ayaan Zaidi
685f9903ec fix: unify responses api capability detection (#67918)
* fix: unify responses api capability detection

* fix: unify responses api capability detection (#67918)
2026-04-17 08:03:19 +05:30
Peter Steinberger
ee856ab31f test: speed up safe-bins exec harness 2026-04-17 02:57:18 +01:00
Peter Steinberger
acd86a06cd test: preserve tool helpers in embedded runner mocks 2026-04-17 02:57:18 +01:00
Peter Steinberger
77e6e4cf87 refactor: move memory embeddings into provider plugins 2026-04-17 02:57:18 +01:00
Peter Steinberger
a2753e2d9f fix: keep Opus 4.7 effort separate from adaptive thinking 2026-04-17 01:26:11 +01:00
Peter Steinberger
c73a6d2f68 feat: support xhigh for Claude Opus 4.7 2026-04-17 01:26:11 +01:00
Tak Hoffman
81818df1b4 fix(startup): prioritize bootstrap on fresh sessions 2026-04-16 17:53:07 -05:00
Peter Steinberger
a98754d504 refactor(agents): clarify prompt cache compatibility gates 2026-04-16 14:59:20 -07:00
Onur
3ae5d95bfd CI: fix live Docker auth mounts (#67812)
* CI: fix live Docker auth mounts

* CI: harden live Docker auth mounts
2026-04-16 23:00:11 +02:00
Devin Robison
8b7d76bfbb fix(compaction): stop retaining credential-like values (#67801) 2026-04-16 14:04:45 -06:00
Chris Yau
36dd58ac2a Prevent Codex HTML challenge pages from looking like DNS failures
Cloudflare challenge pages from chatgpt.com/backend-api can arrive as raw HTML without an HTTP status prefix. The transport sanitizer scanned for generic "dns" substrings before HTML detection, so these pages could surface as DNS lookup failures instead of the existing HTML/CDN block message.

Constraint: Must preserve DNS transport classification for real ENOTFOUND/getaddrinfo failures
Rejected: Treat every bare HTML document as an upstream HTML error | too broad for arbitrary model text/errors
Confidence: high
Scope-risk: narrow
Directive: Keep standalone HTML challenge detection ahead of generic transport keyword matching so CDN block pages do not regress into DNS copy
Tested: oxfmt --check on changed files; targeted node --import tsx verification for standalone Cloudflare HTML classification and DNS control case
Not-tested: Full Vitest shard run in this environment
2026-04-16 12:47:12 -07:00
Josh Lehman
a327b6750d fix: stabilize context engine prompt cache touches (#67767)
* fix: stabilize context engine prompt cache touches

* fix(changelog): document context-engine prompt cache touch stabilization
2026-04-16 11:53:42 -07:00
Daniel Salmerón Amselem
687ede50a5 fix(agents): add prompt cache compatibility opt-out
Add compat.supportsPromptCacheKey for OpenAI Responses prompt_cache_key handling, update generated config baseline, changelog, and A2UI dependency-layout test compatibility.
2026-04-16 10:48:51 -07:00
Bartok
c4488d5ef5 fix: pin localeCompare to 'en' locale for cross-environment stability
Addresses review feedback: localeCompare without a fixed locale uses the
runtime default, which varies across servers. Pinning 'en' ensures
byte-identical prompts for cache stability. Applied at all three sort
points in workspace.ts.
2026-04-16 10:28:22 -07:00
Bartok Moltbot
a4b94f77b9 fix(skills): sort available_skills alphabetically for prompt cache stability
Sort the merged skill entries by name before rendering into the
available_skills prompt block. Previously the order depended on
Map insertion order which varies with skills.load.extraDirs config,
causing identical deployments to produce different prompts and bypass
LLM prompt caching.

Two sort points added:
1. loadSkillEntries — canonical ordering at the source
2. resolveWorkspaceSkillPromptState — ensures prompt stability even
   when callers pass pre-built entry arrays

Fixes #64167
2026-04-16 10:28:22 -07:00
Peter Steinberger
1183832d4f fix: pin codex resume sandbox override 2026-04-16 17:31:41 +01:00
Peter Steinberger
86f108401b fix: share agent harness runtime activation (#67474) 2026-04-16 09:06:45 -07:00
Ayaan Zaidi
16c608e393 fix: harden cron announce NO_REPLY suppression (#65016) (thanks @BKF-Gitty) 2026-04-16 21:36:43 +05:30
Peter Steinberger
892baf2e81 test: align PDF tool expectations with Opus 4.7 2026-04-16 08:56:56 -07:00
Peter Steinberger
461d0050d9 fix: keep codex resume runs non-interactive (#67666) (thanks @plgonzalezrx8) 2026-04-16 08:41:57 -07:00
Pedro Gonzalez
4c66978591 security(codex): restore sandbox protections for resumed CLI sessions 2026-04-16 08:41:57 -07:00
Peter Steinberger
628b454eff feat: default Anthropic to Opus 4.7 2026-04-16 16:12:06 +01:00
stain lu
c3c7a9953f fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu)
* fix(agents): preserve native Anthropic tool IDs for hybrid providers

Fixes #66892

MiniMax and other hybrid providers use api.minimaxi.com/anthropic
(modelApi: anthropic-messages), which generates and expects native
Anthropic tool_call_ids in toolu_* format. The hybrid replay policy
(buildHybridAnthropicOrOpenAIReplayPolicy) applied strict
sanitization that stripped underscores from these IDs, causing
MiniMax to reject them with error 2013.

The native Anthropic provider already preserved these IDs via
preserveNativeAnthropicToolUseIds (added in 4613f121ad). This
commit enables the same flag for the hybrid anthropic-messages
branch, so toolu_* IDs pass through unsanitized while other
synthetic IDs still get strict cleanup.

* fix(agents): repair sanitized replay tool results before send

* fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu)

* fix: preserve aborted-span tool results during replay sanitize (#67620) (thanks @stainlu)

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-16 18:38:57 +05:30
Ayaan Zaidi
de129a6530 fix: restrict HTML timeout short-circuit to transient statuses 2026-04-16 18:33:35 +05:30
Xan Torres
36ed36768c Agents/tool-loop: enable unknown-tool stream guard by default 2026-04-16 18:31:56 +05:30
Xan Torres
b23d59a522 Gateway/skills: invalidate session skills snapshot on config write 2026-04-16 18:31:56 +05:30
stain lu
e588e904a7 fix: classify HTML provider error pages correctly (#67642) (thanks @stainlu)
* fix(agents): classify Cloudflare/CDN HTML error pages as transport failures

Fixes #67517

When a provider endpoint returns an HTML error page (e.g. Cloudflare
502/503/520-524), the pattern-based message classifiers would scan
the HTML body and misinterpret embedded text like "Rate limit
exceeded" as a structured rate_limit API error. This caused
incorrect failover behavior (profile rotation instead of clean
retry/fallback) and left the TUI stuck.

Two fixes:
1. classifyFailoverSignal now short-circuits on HTML responses
   before running pattern matchers, returning "timeout" (transport
   failure) so retry/fallback handles them correctly.
2. classifyProviderRuntimeFailureKind now detects HTML errors at
   any status (not just 403), returning "upstream_html" for
   non-403 statuses with a clear user-facing message about
   CDN/gateway errors.

Adds regression tests covering Cloudflare 502/503 HTML with
embedded rate-limit text, 403 HTML (still classified as auth),
and JSON rate-limit responses (still classified correctly).

* fix: preserve auth and proxy HTML classification

* fix: classify HTML provider error pages correctly (#67642) (thanks @stainlu)

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-16 18:19:53 +05:30
Nimrod Gutman
90801ba400 fix(openai-codex): normalize stale transport metadata in resolution and discovery (#67635)
Merged via squash.

Supersedes:
- #66969 by @saamuelng601-pixel
- #67159 by @hclsys

Co-authored-by: saamuelng601-pixel <274746699+saamuelng601-pixel@users.noreply.github.com>
Co-authored-by: hclsys <7755017+hclsys@users.noreply.github.com>
2026-04-16 14:30:05 +03:00
stain lu
ecfaf64526 fix: align host tilde paths with OS home (#62804) (thanks @stainlu)
* fix(tools): expand tilde in host edit/write paths (non-workspace mode)

* test: use it.runIf for visible skip when tmpdir is not under home

* fix(tools): address Codex P2 review on tilde host edit/write

Responds to two P2 findings from chatgpt-codex-connector on #62804:

1. Tests never ran in CI. The it.runIf(tmpdirUnderHome) guard always
   skipped on Linux runners where os.tmpdir() is /tmp, outside $HOME, so
   the regression tests reported green without executing. Tmpdirs now use
   the test-isolated HOME (process.env.HOME from test/test-env.ts) so
   tests run in every environment and match what expandHomePrefix
   resolves, keeping them hermetic.

2. Edit recovery path resolution was inconsistent. resolveEditPath
   inlined os.homedir() for tilde expansion, bypassing OPENCLAW_HOME,
   while the write/edit operations use expandHomePrefix. Under a custom
   OPENCLAW_HOME, wrapEditToolWithRecovery's readback targeted a
   different file than the edit actually touched, so successful edits
   could be reported as failures. resolveEditPath now uses the same
   expandHomePrefix helper.

* test(tools): verify tilde expansion honors OPENCLAW_HOME override

The prior tests covered tilde expansion but only under the default test
home, which matches os.homedir(). That passed whether the production code
used expandHomePrefix() or inlined os.homedir() — the behaviors only
diverge when OPENCLAW_HOME is set to a path outside $HOME.

Adds four tests that set OPENCLAW_HOME to a temp dir explicitly outside
$HOME and verify that write/mkdir/read/access tilde operations resolve
against OPENCLAW_HOME, not os.homedir(). These would fail if
pi-tools.read.ts or pi-tools.host-edit.ts reverted to os.homedir(),
directly covering the Codex P2 feedback about OPENCLAW_HOME consistency.

Uses the same env snapshot/restore pattern as test/helpers/temp-home.ts.

* Agents: resolve host tilde paths against OS home

* fix: align host tilde paths with OS home (#62804) (thanks @stainlu)

* fix: keep the changelog entry in the active block (#62804) (thanks @stainlu)

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-16 14:37:55 +05:30
Ayaan Zaidi
898fd0482a fix(agents): preserve cli session metadata before transcript persist (#67490) 2026-04-16 09:30:31 +05:30
Ayaan Zaidi
3a3fae0eac fix(agents): normalize cli transcript api field 2026-04-16 09:30:31 +05:30
Ayaan Zaidi
b8ef507cc0 fix(agents): persist cli transcript turns 2026-04-16 09:30:31 +05:30
dallylee
bd7418d4e9 fix(agents): classify connection-mismatch replay errors as replay-invalid (#66475)
Merged via squash.

Prepared head SHA: 97738583de
Co-authored-by: dallylee <132358482+dallylee@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-04-15 22:49:11 +03:00
Devin Robison
52ef42302e fix: tighten trusted tool media passthrough (#67303)
* fix: tighten trusted tool media passthrough

* changelog: tighten trusted tool media passthrough (#67303)

* address review: thread rawToolName into emitToolResultOutput and keep plugin-tool media passthrough

- Pass rawToolName through emitToolResultOutput params so the emit and
  collect calls no longer reference an out-of-scope identifier
  (ReferenceError on any verbose tool-output path).
- Widen builtinToolNames to all effective tool raw names for this run
  (core + bundled/trusted plugin tools), so plugin tools on the trusted
  media list still receive local MEDIA: passthrough. Admission-time
  client-tool conflict check keeps using the core-only set so unrelated
  plugin names do not spuriously reject client definitions; MEDIA
  passthrough is still gated by the raw-name set, so a client tool that
  normalize-collides with a plugin name cannot inherit its media trust.
- Add unit coverage for bundled-plugin raw-name passthrough and for
  case-variant plugin-name collisions.

* drop redundant String() casts flagged by oxlint no-useless-cast

The names from effectiveTools, client tool function names, and the
existingToolNames iterable are already typed as string, so wrapping them
in String(...) adds nothing and trips oxlint's no-useless-cast rule.
2026-04-15 13:12:44 -06:00