Commit Graph

20561 Commits

Author SHA1 Message Date
Peter Steinberger
9b42cd8728 test: fix cost snapshot PR checks 2026-04-21 03:44:25 +01:00
Dexter (Miaigi)
47bb5ddece fix(cost): snapshot estimatedCostUsd instead of accumulating (#69347)
The bug: three persist sites accumulated cost instead of snapshotting
it like tokens. This caused cost to be inflated 1x-72x on multi-persist
sessions because the same cumulative usage was added repeatedly.

Root cause: persistSessionUsageUpdate, updateSessionStoreAfterAgentRun,
and the cron isolated-agent run path all used:
  estimatedCostUsd = existingCost + runCost

But runCost was already computed from cumulative run usage, so this
added the same cost repeatedly on redundant persists.

Fix: snapshot cost directly like tokens already do:
  estimatedCostUsd = runCost

Files affected:
- src/auto-reply/reply/session-usage.ts
- src/agents/command/session-store.ts
- src/cron/isolated-agent/run.ts

Tests added:
- session-store.test.ts: verify cost is snapshotted, not accumulated
- session.test.ts: updated existing test to verify snapshot behavior

Fixes #69347
2026-04-21 03:44:25 +01:00
Sk7n4k3d
0eb6f5d8bc session: clear auto-sourced model/auth overrides on /new and /reset 2026-04-21 03:36:16 +01:00
Peter Steinberger
215d5fb320 fix: clear auto-failover model overrides (#69365) (thanks @Chevron7Locked) 2026-04-21 03:35:16 +01:00
Kevin O'Neill
dc0e966ed2 fix(model-selection): address Codex review P2 feedback on auto-heal override clearing
Three corrections to the auto-failover self-healing introduced in the prior commit:

1. Reset in-memory provider/model to configured primary after clearing auto override.
   get-reply-directives.ts preloads provider/model from the stored override before
   calling createModelSelectionState, so clearing only session state still ran the
   current turn on the fallback. Now provider/model are reset to defaultProvider/
   defaultModel so this turn retries the primary immediately, not on the next turn.

2. Remove resetModelOverride = true from the auto-heal path. That flag triggers a
   "Model override not allowed for this agent" system event in
   applyInlineDirectiveOverrides, which is incorrect: the override was valid and set
   by the fallback loop — it just expired once the primary recovered. Auto-heal is
   not an allowlist violation.

3. Add a test case that verifies the in-memory reset when the caller pre-loads the
   fallback provider/model (simulating the get-reply-directives.ts preload path).

Known limitation (noted in comment): channel model overrides (channels.modelByChannel)
are skipped on the recovery turn because hasSessionModelOverride was true when they
were evaluated at preload time. They resume on the following turn once session state
is clear. Fixing this cleanly requires changes to the get-reply-directives preload
flow and is out of scope for this PR.
2026-04-21 03:35:16 +01:00
Kevin O'Neill
f2abe28d40 fix(model-selection): clear auto-failover overrides so primary is retried on each turn
When runWithModelFallback falls back to a secondary provider it writes
providerOverride/modelOverride/modelOverrideSource:"auto" to the session.
On subsequent turns createModelSelectionState read this stored override and
passed the fallback provider directly to runWithModelFallback, so the
configured primary was never retried — the session was permanently pinned to
the fallback even after the primary recovered.

Fix: at model-selection ingress, when the direct session override has
modelOverrideSource "auto" (set by a previous automatic fallback, not a user
/model command), clear the override and retry the configured primary. If the
primary is still down runWithModelFallback will fall back and re-set the auto
override for that turn. Once the primary recovers the override stays clear.

User-selected overrides (modelOverrideSource "user" or legacy undefined+model)
are preserved unchanged.

Covered by four new unit tests in model-selection.test.ts:
- auto-failover override cleared and primary retried
- user-selected override preserved
- legacy override without source field preserved
- parent-session auto-override applied to child (not cleared by child logic)
2026-04-21 03:35:16 +01:00
Peter Steinberger
8150c363b5 fix: stabilize memory dreaming QA 2026-04-21 03:12:42 +01:00
Peter Steinberger
a9bef83a0c refactor: delegate bluebubbles conversation helpers 2026-04-21 03:12:42 +01:00
SARAMALI15792
fb1a5a2c26 test(gateway): assert cli_container_local precedence over loopback fallback (#69397) 2026-04-21 03:10:34 +01:00
SARAMALI15792
8ef356d5c3 fix(gateway): classify loopback shared-secret clients as local for pairing (#69397) 2026-04-21 03:10:34 +01:00
Sliverp
b938e6398b feat: add tiered model pricing support (#67605)
Adds tiered model pricing support for cost tracking, keeps configured pricing ahead of cached catalog values, and includes latest Moonshot Kimi K2.6/K2.5 cost estimates.\n\nThanks @sliverp.
2026-04-21 03:02:57 +01:00
Peter Steinberger
8d747d20b8 test: split contract vitest shards 2026-04-21 03:01:08 +01:00
Andrii Furmanets
b6a8759b29 fix(web-search): restore SecretRef runtime compatibility for bundled providers (#68424)
Adds missing compatibility runtime path metadata for bundled SecretRef-capable web-search providers and keeps the manifest registry covered by a regression test.\n\nThanks @afurm!
2026-04-21 02:34:24 +01:00
Peter Steinberger
f04185cc70 test: stabilize live media and gateway probes 2026-04-21 02:10:19 +01:00
aniaan
c8e5150fd4 feat(moonshot): default to Kimi K2.6 with K2.6-only thinking.keep support (#68816)
Merged via squash.

Prepared head SHA: ed54e02842
Co-authored-by: aniaan <40813941+aniaan@users.noreply.github.com>
Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com>
Reviewed-by: @odysseus0
2026-04-20 18:04:49 -07:00
Peter Steinberger
a112903802 test: use synthetic auto reply fixtures 2026-04-21 01:49:30 +01:00
Peter Steinberger
60a1f01a3e test: use synthetic agent infra fixtures 2026-04-21 01:46:33 +01:00
Peter Steinberger
442da01db4 test: use synthetic program status fixtures 2026-04-21 01:44:03 +01:00
Peter Steinberger
969ca8511d test: use synthetic cli provider fixtures 2026-04-21 01:42:29 +01:00
Peter Steinberger
66665eea6d test: use synthetic status session fixtures 2026-04-21 01:40:29 +01:00
Peter Steinberger
6bb6cfc68e test: use synthetic plugin channel fixtures 2026-04-21 01:32:27 +01:00
Peter Steinberger
97e528ed54 test: use synthetic agent session fixtures 2026-04-21 01:24:34 +01:00
Peter Steinberger
9f2f89320e test: use synthetic infra channel fixtures 2026-04-21 01:21:05 +01:00
Peter Steinberger
14ceec27fa test: use synthetic config cron channel fixtures 2026-04-21 01:19:35 +01:00
Peter Steinberger
f50202ee95 test: use synthetic auto-reply channel fixtures 2026-04-21 01:18:05 +01:00
Peter Steinberger
e8898bb6c1 test: use synthetic agent channel fixtures 2026-04-21 01:15:11 +01:00
Peter Steinberger
3f274006cd refactor: share oauth callback flow 2026-04-21 01:07:09 +01:00
Peter Steinberger
7b1f7b179f refactor: share thread binding lifecycle 2026-04-21 01:07:09 +01:00
Peter Steinberger
4ea8063203 refactor: reuse operator approval gateway lifecycle 2026-04-21 01:07:09 +01:00
Amine Harch el korane
8c05043eca fix(telegram): tune polling stall threshold
Raise the Telegram polling watchdog default from 90s to 120s and add bounded channels.telegram.pollingStallThresholdMs overrides, including per-account config.\n\nThanks @Vitalcheffe.
2026-04-21 01:03:04 +01:00
Peter Steinberger
660e4257a7 refactor: share codex auth bridge 2026-04-21 00:54:08 +01:00
Peter Steinberger
0647481c7c refactor: share ssrf policy merging 2026-04-21 00:54:08 +01:00
Peter Steinberger
7e28caa637 refactor: share fast mode normalization 2026-04-21 00:54:08 +01:00
Peter Steinberger
44ca47b2eb refactor: share allow-from store file reads 2026-04-21 00:54:08 +01:00
Peter Steinberger
3caf9faef5 test: use synthetic metadata channel labels 2026-04-21 00:52:52 +01:00
Peter Steinberger
71154bf3bf test: use synthetic approval pairing fixtures 2026-04-21 00:51:37 +01:00
Peter Steinberger
05835dd2d4 test: use synthetic heartbeat wake fixtures 2026-04-21 00:50:32 +01:00
Peter Steinberger
874306a2ac test: use synthetic task delivery fixtures 2026-04-21 00:49:06 +01:00
Peter Steinberger
aecd709dfe test: use synthetic task channel fixtures 2026-04-21 00:47:12 +01:00
Watchtower
dcb525de50 fix(pi-embedded-runner): gate silent-error retry on replay safety
Per @steipete review on #68310: the silent-error retry must not fire when the
failed attempt already recorded potential side effects (messaging tool sent,
cron add, or a mutating tool call that wasn't round-tripped as replay-safe).
Otherwise resubmission can duplicate those actions.

Adds `!attempt.replayMetadata.hadPotentialSideEffects` to the retry condition,
mirroring the gate used by resolveEmptyResponseRetryInstruction and the
planning-only / reasoning-only retry resolvers in run/incomplete-turn.ts.

Adds a new negative regression test:
  "does not retry when the failed attempt recorded side effects"
which reproduces the reviewer's repro — stopReason=error + output=0 + empty
content, but replayMetadata={hadPotentialSideEffects: true, replaySafe: false}.
Expected: no retry, surfaces incomplete-turn error. Confirmed locally.
2026-04-21 00:43:50 +01:00
Watchtower
5fb302ebf1 fix(pi-embedded-runner): retry silent stopReason=error turns (non-frontier models)
ollama/glm-5.1:cloud (and occasionally other models) can end a turn with
stopReason="error", usage.output=0, and empty content[] after a successful
tool-call sequence. The existing empty-response retry path in
src/agents/pi-embedded-runner/run/incomplete-turn.ts is gated on
isStrictAgenticSupportedProviderModel (gpt-5 family only), so non-frontier
models fall through to "incomplete turn detected" with payloads=0 and no
recovery. The user sees no reply and has to nudge.

Add a narrow, model-agnostic resubmission inside the attempt loop, placed
before the incompleteTurnText surface-to-user return:

  - stopReason === "error"
  - usage.output === 0
  - content.length === 0   (excludes reasoning-only error turns)
  - bounded by MAX_EMPTY_ERROR_RETRIES = 3

No instruction injection, no model gating; same prompt, same session
transcript (tool results already captured), just let the loop try again.

New test file run.empty-error-retry.test.ts covers:
  1. Retries for ollama/glm-5.1:cloud → succeeds on 2nd attempt.
  2. Caps at 3 retries → 4 total attempts → surfaces incomplete-turn error.
  3. Does NOT retry when output > 0 (preserve produced text).
  4. Does NOT retry when stopReason=stop + output=0 (NO_REPLY path).
  5. Retries for anthropic/claude-opus-4-7 too — model-agnostic.

Relates to #68281.
2026-04-21 00:43:50 +01:00
Peter Steinberger
982b1c9464 test(ci): reduce channel contract import cost 2026-04-21 00:40:07 +01:00
Peter Steinberger
503af7afa6 refactor: dedupe install scan skill spec 2026-04-21 00:32:42 +01:00
Peter Steinberger
1a834a0ff6 test: reuse runtime sidecar uniqueness helper 2026-04-21 00:32:42 +01:00
Dale Yarborough
7b5527a74e fix(gateway): prevent 1006 errors from race condition in WebSocket upgrade (#43392)
Merged via squash.

Prepared head SHA: 0bca6d3512
Co-authored-by: dalefrieswthat <176454532+dalefrieswthat@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
2026-04-20 16:29:14 -07:00
Peter Steinberger
da3f47ddd0 test: share generation live env helper 2026-04-21 00:24:18 +01:00
Peter Steinberger
a95b61560a test: dedupe reconnect drain fixtures 2026-04-21 00:24:17 +01:00
scoootscooob
f700ad32a8 providers: default Moonshot to Kimi 2.6 (#69477)
Merged via squash.

Prepared head SHA: 4d778d09d1
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Reviewed-by: @scoootscooob
2026-04-20 16:15:29 -07:00
Peter Steinberger
a732b916f4 test: use synthetic media channel fixtures 2026-04-20 23:59:39 +01:00
Garry Tan
c8086b731a tasks: add detached task recovery hook before markLost (#69313)
Merged via squash.

Prepared head SHA: 24322af4f7
Co-authored-by: garrytan <19957+garrytan@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-04-21 00:58:20 +02:00