Commit Graph

159 Commits

Author SHA1 Message Date
Dallin Romney
450060d7a2 test(qa): expand smoke-ci and release categories and coverage (#93175)
* test(qa): add smoke ci primary coverage evidence

* test(qa): remove overstated primary coverage claims

* test(qa): make release profile include smoke ci

* test(qa): trim taxonomy formatting churn

* test(qa): avoid hardcoded profile names in coverage test

* test(qa): make release profile cover taxonomy

* test(qa): type profile fixture all category flag

* test(qa): include channel delivery in smoke ci profile
2026-06-15 18:05:52 -07:00
Dallin Romney
fef8394079 Convert QA scenarios to YAML files (#92915)
* refactor: load QA scenarios from YAML

* docs: update personal QA scenario docs

* test: keep QA scenarios YAML-only
2026-06-14 17:31:18 -07:00
NVIDIAN
ecaebfc51b fix(agents): retry thinking-only errored turns (#92191)
Retry replay-safe reasoning-only provider errors before assistant failover while preserving classified fallback and terminal-output ownership. Adds deterministic Anthropic gateway fault-injection coverage and focused regression tests.\n\nCo-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com>
2026-06-14 09:39:27 -07:00
Dallin Romney
a3e9dfee0e Simplify QA scorecard mapping shape (#92558)
* test(qa): simplify scorecard mapping shape

* test(qa): use typed scorecard evidence refs

* test(qa): map scorecard categories by coverage id

* feat: align qa coverage with taxonomy features

* refactor: keep qa coverage ids canonical

* refactor: minimize qa coverage id churn

* test: align qa coverage id assertions

* test: update qa evidence coverage expectations

* refactor qa taxonomy coverage ids

* style qa taxonomy coverage ids

* test qa coverage lint fix

* test qa coverage type fix
2026-06-14 00:16:33 -07:00
Dallin Romney
561b293c7a Run Vitest and Playwright scenarios from qa suite (#92606)
* test(qa): run vitest and playwright scenarios from qa suite

* fix(qa): harden scenario suite dispatch

* refactor(qa): share scenario path utilities

* refactor(qa): share test file scenario runner

* refactor(qa): route test file scenarios through suite runtime

* refactor(qa): use explicit suite runtime result kind

* test(qa): write suite evidence artifact

* refactor(qa): clarify suite execution dispatch

* fix(qa): keep test-file scenarios out of flow-only runners

* refactor(qa): export mixed scenario suite runner
2026-06-13 01:06:10 -07:00
Vincent Koc
7d3e8dc963 test(qa): restore memory fallback config safely 2026-06-10 18:03:15 +09:00
Vincent Koc
2c146261a2 fix(qa): accept completed mock image turns 2026-06-10 17:48:33 +09:00
Vincent Koc
87abb8defb fix(qa): preserve Matrix recovery state in sqlite 2026-06-10 17:35:41 +09:00
Vincent Koc
3b180d5d99 fix(qa): wait for restart wake before capability check 2026-06-10 17:09:18 +09:00
brokemac79
de4b8d8ebf feat(plugins): allow installed trusted policy contracts
Allow explicitly enabled installed plugins to register declared trusted tool policies and agent tool result middleware, with trusted policy ids scoped by plugin owner.\n\nVerification covered targeted plugin/agent tests, typecheck, build, lint, local autoreview, and a Blacksmith Testbox runtime proof (tbx_01ktr1nq0rhq47fjkwrepm7fd3).
2026-06-10 16:18:23 +10:00
Vincent Koc
c350c35fad fix(release): allow QA capability restore patch
(cherry picked from commit db711701d2)
2026-06-10 08:27:59 +09:00
Vincent Koc
50130d32a9 test(release): align qa tool coverage gate 2026-06-09 01:02:24 +02:00
Vincent Koc
c7b01cf201 test(release): stabilize qa runtime parity gate 2026-06-09 01:02:24 +02:00
Peter Steinberger
b8d08f0cfd docs: document repository scripts 2026-06-04 20:52:50 -04:00
Vincent Koc
a9f099d279 test(qa): require channel scenario markers 2026-06-03 14:27:25 +02:00
Bek
bce3d5bf92 trace: Correlate channel diagnostics into one trace
Correlates channel receive, agent lifecycle, model attempt diagnostics, and outbound delivery diagnostics into one trace waterfall so channel message runs can be inspected end-to-end.

Maintainer follow-up removed the internal `AgentHarnessV2` adapter surface and kept the harness path canonical through `src/agents/harness/lifecycle.ts`.

Proof:
- PR checks passed on `04e9189c15480d53663d533a04c9883164b4dd54`.
- `node scripts/run-vitest.mjs src/agents/harness/lifecycle.test.ts src/agents/harness/selection.test.ts src/channels/turn/kernel.test.ts`
- `pnpm check:changed` Testbox `tbx_01kt3xtrm70qc7nb90cqv5rah1`

Thanks @bek91.

Co-authored-by: Bek <bek.akhmedov@gmail.com>
2026-06-02 06:38:00 -04:00
Vincent Koc
4550cfa6a7 fix(qa): run plugin MCP probes from repo root 2026-06-01 07:13:24 +02:00
Peter Steinberger
b653d94918 chore(lint): enable no-useless-assignment 2026-05-31 22:40:48 +01:00
yaoyi1222
75e0053cf9 fix(auto-reply): warn on substantive private message-tool finals
Warn operators when message_tool_only produces unusually substantive private final text without a delivered source reply. Keeps short/NO_REPLY silence quiet, avoids logging response bodies, and distinguishes unrelated side effects from source-reply delivery.
2026-05-31 14:35:58 +01:00
Peter Steinberger
4c33aaa86c refactor: unify OpenAI provider identity (#88451)
* refactor: unify OpenAI provider identity

* refactor: move legacy oauth sidecar doctor helpers

* test: align OpenAI fixtures after rebase

* test: clean OpenAI provider unification

* fix: finish OpenAI provider cleanup

* fix: finish OpenAI cleanup follow-through

* fix: finish OpenAI CI cleanup
2026-05-31 00:29:44 +01:00
Shakker
308fdbe7fb refactor: remove skill workshop plugin package 2026-05-30 20:04:52 +01:00
Peter Steinberger
1188aa3b81 feat: add Claude Opus 4.8 support (#87890)
* feat: add Claude Opus 4.8 support

* fix: omit Vertex Opus sampling overrides

* fix: preserve Opus adaptive thinking levels

* fix: clamp Anthropic max effort support

* fix: use sha256 for QA mock call ids

* fix: type Anthropic transport test model metadata

* test: update PDF model default for Opus 4.8
2026-05-29 06:10:42 +01:00
Ramrajprabu
f3cfd752d3 feat(copilot): add GitHub Copilot agent runtime
Adds the opt-in bundled GitHub Copilot agent runtime, pinned SDK install path, docs/inventory, SDK/tool/sandbox/auth wiring, and replay/tool-safety fixes.

Verification:
- Local: git diff --check; fnm exec --using 24.15.0 pnpm tsgo:extensions; fnm exec --using 24.15.0 pnpm check:test-types; fnm exec --using 24.15.0 pnpm build.
- Autoreview local: clean for the replay-safety fix; branch autoreview engine returned empty output twice, so local autoreview plus local/Crabbox/CI proof was used.
- Crabbox focused Copilot: run_2c0db9f48a4a, 19 files / 485 tests passed.
- Crabbox additional boundary shard: run_26a246a1aa24, prompt snapshots and plugin SDK boundary/export checks passed.
- Crabbox live Copilot: run_d128e4048b4e, real gpt-4.1 turn with live_echo phase-1-green and clean session-file check.
- GitHub checks: green on head 7cc8657e0d, including Dependency Guard after exact-head approval.

Co-authored-by: Ramraj Balasubramanian <ramrajba@microsoft.com>
2026-05-29 05:15:22 +01:00
Peter Steinberger
bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341)
* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00
Vincent Koc
14198a1c66 fix(qa): close remaining mock qa e2e regressions 2026-05-27 10:06:08 +02:00
Vincent Koc
81c1892c9a fix(qa): stabilize mock QA scenario contracts 2026-05-27 10:06:08 +02:00
Peter Steinberger
3c6fd49d74 ci: stop waiting for nonexistent capability restart wake 2026-05-26 18:15:16 +01:00
Peter Steinberger
27359ec417 ci: stabilize release live QA gates 2026-05-26 17:41:30 +01:00
Vincent Koc
a122d804dd fix(gateway): abort stale agent runs on restart 2026-05-25 23:26:10 +02:00
Peter Steinberger
9f7485e182 test: port release validation stabilizers 2026-05-25 21:50:49 +01:00
Vincent Koc
3eb06e305e fix(qa): harden restart inflight Windows scenario 2026-05-25 18:49:04 +02:00
Vincent Koc
ba2b820c5c fix(qa): extend memory fallback Windows budget 2026-05-25 14:43:25 +02:00
Vincent Koc
9afbfc1b63 fix(qa): capture Windows gateway metrics 2026-05-25 11:24:16 +02:00
Peter Steinberger
8473e8933a test(qa): remove brittle capability flip setup turn 2026-05-24 14:30:59 +01:00
Peter Steinberger
c91c3c6e5a test(qa): extend capability flip setup budget 2026-05-24 14:02:22 +01:00
Peter Steinberger
c7d4e9e1c2 test(qa): widen capability flip restart budget 2026-05-24 13:38:54 +01:00
Vincent Koc
7f05be041e fix(diagnostics): harden observability exports and smokes (#85371)
* test(diagnostics): widen observability smokes

* fix(diagnostics): sanitize observability exports

* docs(diagnostics): format otel export docs
2026-05-23 15:27:43 +08:00
Vincent Koc
304ff68c79 fix(qa-lab): stabilize codex runtime parity fixtures 2026-05-23 10:16:22 +08:00
Peter Steinberger
cc91ff04cc fix(release): stabilize config restart QA 2026-05-22 15:53:50 +01:00
Vincent Koc
2b396131e4 test(qa-lab): add bus tool trace scenario 2026-05-22 20:12:49 +08:00
Vincent Koc
beccdde5bf fix(qa): isolate patched suite scenarios 2026-05-22 10:59:23 +02:00
Vincent Koc
9bd97d2c60 test(qa-lab): remove generic evidence wording 2026-05-22 16:54:04 +08:00
Vincent Koc
f015c3ff52 test(qa-lab): tag live-only runtime sentinels 2026-05-22 07:42:09 +08:00
Vincent Koc
fad1c8a071 test(qa-lab): add long-context watchdog scenario 2026-05-22 07:16:35 +08:00
Peter Steinberger
e2c92be90b chore(release): bump version to 2026.5.21 2026-05-22 00:09:45 +01:00
Dallin Romney
ebd8b00cc3 fix(qa-lab): rename codex lifecycle fixtures to match knip ignore pattern (#85066)
knip's deadcode-unused-files check ignores fixtures matching **/*.fixture.ts
(dot before "fixture"). The codex lifecycle fixtures landed in bbf3eec786
as auth-profile-fixture.ts and codex-plugin-fixture.ts (hyphen), so knip
flagged them as unexpected unused files and CI's check-dependencies job
has been failing on main since then. Rename to auth-profile.fixture.ts
and codex-plugin.fixture.ts and update the lifecycle test, the fixture
cross-import, and the six qa/scenarios markdown files that reference
them by path and qaImport specifier.
2026-05-21 11:56:59 -07:00
Vincent Koc
bbf3eec786 test(qa-lab): cover codex plugin lifecycle fixtures 2026-05-22 01:42:25 +08:00
Vincent Koc
46c8864048 revert(qa-lab): remove scenario github traceability metadata 2026-05-22 01:27:29 +08:00
Vincent Koc
efb7e4742f test(qa-lab): trace scenario issue evidence 2026-05-22 00:51:32 +08:00
Vincent Koc
9f2c0a80b4 fix(qa): keep searchable tool coverage report-only 2026-05-21 23:55:35 +08:00