Vincent Koc
a9024741c2
test(qa): pin live artifact scenario contracts
2026-06-23 05:13:35 +02:00
Vincent Koc
d1b268f7f7
fix(qa): normalize completed wait envelopes
2026-06-23 05:13:35 +02:00
Vincent Koc
c48dd3cdd1
fix(ci): align maturity score source with taxonomy
2026-06-23 10:46:07 +08:00
Dallin Romney
b71ddbf1b4
ci: simplify maturity scorecard QA evidence inputs ( #95898 )
...
* ci: simplify maturity scorecard evidence inputs
* ci: keep maturity renderer defaults runnable
* ci: validate maturity evidence source
* ci: split maturity scorecard codex agent
* ci: remove codex copy from maturity evidence workflow
* ci: narrow maturity evidence workflow secrets
2026-06-22 19:24:43 -07:00
Vincent Koc
d716dfd532
test(qa): wait for live history replies in flow scenarios
2026-06-23 04:01:11 +02:00
Vincent Koc
def4b51485
fix(qa): gate smoke profile scenarios by channel driver
2026-06-23 09:34:52 +08:00
Vincent Koc
43f2b61f3b
test(qa): keep image generation fixture on mock lane
2026-06-23 02:35:02 +02:00
Vincent Koc
086c629556
test(qa): scope provider-sensitive flow fixtures
2026-06-23 02:17:20 +02:00
Vincent Koc
befe04f465
test(qa): accept Sonnet max thinking support
2026-06-23 01:57:43 +02:00
Vincent Koc
264b37e9d2
test(qa): avoid redacted config cleanup patch
2026-06-23 01:39:39 +02:00
Dallin Romney
63b13ea837
feat(qa): crabline channel driver ( #91502 )
...
* feat(qa): add crabline channel driver seam
* feat: run crabline channel driver smoke
* chore: keep crabline qa dependency dev-only
* refactor(qa): keep crabline driver details opaque
* chore(qa): pin crabline to merged driver API
* feat(qa): drive channel driver from profiles
* fix(qa): declare crabline runtime peer
* feat(qa): resolve crabline channel from scenarios
* feat(qa): treat unsupported profile channels as coverage gaps
* Revert "feat(qa): treat unsupported profile channels as coverage gaps"
This reverts commit 65a9701655 .
* fix(qa): adapt crabline driver to chat sdk cli
* refactor(qa): pass channel driver metadata directly
* chore(qa): update crabline provider pin
* chore(qa): default channel scenarios to driver
* chore: repair qa dependency lockfile
* chore: allow native qa dependency builds
* fix(qa): satisfy crabline driver lint
* fix(qa): satisfy crabline ci gates
* Use crabline transport for smoke QA profile
* fix(qa): keep crabline driver opt-in
* fix(qa): reuse crabline telegram driver token
* fix(qa): route smoke profile through crabline
* fix(qa): run full smoke profile lane
* fix(qa): remove smoke scenario workflow filter
* fix: stabilize crabline smoke qa profile
* fix: pin crabline qa dependency
* test: keep crabline smoke credential-free
* fix: skip visible reasoning lane for crabline smoke
* fix: unblock crabline qa ci
* Update crabline dependency
* Pin crabline to merged main
* Use Crabline fake provider servers
2026-06-22 15:24:59 -07:00
Vincent Koc
7cc21ef59d
fix(qa): stabilize smoke-ci scenarios
2026-06-22 15:41:53 +08:00
Vincent Koc
7d4d8a7f3d
test(agents): add large prompt cache coverage ( #95653 )
...
Adds large OpenAI and Anthropic prompt-cache live coverage plus a QA Lab long-context tool-result scenario.
Co-authored-by: vincentkoc <vincentkoc@users.noreply.github.com >
Reviewed-by: @vincentkoc
2026-06-22 11:39:37 +08:00
Dallin Romney
5dd30c3995
test: fold HTTP API script proof into QA Lab ( #94700 )
...
* test: fold HTTP API script proof into QA Lab
* test: remove folded HTTP API script tests
* test: relax QA native scenario catalog inventory
* test: trim folded QA Lab script cruft
* test: align folded QA coverage ids
* test: keep native QA evidence out of parity tiers
* test: update mirrored QA routing expectation
* test: preserve chat tools profile build guard
* test: avoid overclaiming gateway tool API coverage
* test: pin folded QA coverage ids
2026-06-20 23:10:35 -07:00
Jesse Merhi
5db2f6c1fc
Add stdout diagnostics OTEL log exporter
...
Adds stdout and both-mode diagnostics OTEL log export, with focused QA Lab smoke coverage and docs/config updates.
Prepared head SHA: efa2ef07ab
Verification: CI 27808480969 passed for the prepared head.
Reviewed-by: @jesse-merhi
2026-06-19 16:06:37 +10:00
Dallin Romney
e12cf72b17
Standardize QA coverage IDs on dotted names ( #94702 )
...
* fix: standardize qa coverage ids
* test: avoid qa coverage id assertion spread
2026-06-18 17:25:26 -07:00
Dallin Romney
4ca0e52d0e
test: fold channel message flows into qa e2e ( #93174 )
...
* test: fold channel flows into qa e2e
* test: keep channel flow skill pointed at qa
* test: move channel flow proof under telegram
2026-06-18 15:45:33 -07:00
Dallin Romney
c4ae2be947
fix: taxonomy coverage id cleanup ( #94304 )
...
* fix: split taxonomy coverage id features
* fix: clean taxonomy feature row names
* docs: clarify taxonomy coverage id semantics
* docs: tighten coverage id guidance
* fix: keep taxonomy features product shaped
* fix: narrow sdk artifact coverage bundle
* fix: name taxonomy coverage ids clearly
* fix: polish taxonomy feature descriptions
2026-06-18 15:16:58 -07:00
Colin Johnson
d5a27b0b96
test: add QA Lab UX Matrix evidence scenario ( #94306 )
...
* test: add qa lab ux matrix script scenario
* fix(qa-lab): annotate UX Matrix producer catch callback as unknown for oxlint
---------
Co-authored-by: Dallin Romney <dallinromney@gmail.com >
2026-06-18 14:10:06 -07:00
Dallin Romney
e17d111990
fix: require all taxonomy coverage ids ( #94296 )
2026-06-17 16:38:14 -07:00
Colin Johnson
591313e80a
qa-lab: support script-backed evidence scenarios ( #94276 )
...
* qa: add script scenario execution kind
* fix(qa-lab): carry suite profile into script producer evidence and simplify artifact path resolution
* fix(qa-lab): keep out-of-repo producer artifacts absolute to avoid ../ traversal refs
---------
Co-authored-by: Dallin Romney <dallinromney@gmail.com >
2026-06-17 15:09:25 -07:00
Vincent Koc
8288b4d4c9
fix(qa-lab): stabilize web search parity fixture
2026-06-18 00:07:36 +02:00
Dallin Romney
f7e5132ffd
test: fold gateway smoke into qa e2e ( #93178 )
2026-06-17 14:55:28 -07:00
Dallin Romney
fae4a01d0d
test: fold otel smoke into qa e2e ( #93181 )
...
* test: fold otel smoke into qa e2e
* test: eliminate otel smoke script
2026-06-17 14:54:58 -07:00
Dallin Romney
0a6736af09
test: fold lifecycle and package proof into QA Lab ( #93114 )
...
* test: fold script coverage into qa scenarios
* test: migrate script checks into qa e2e
* test: point qa code refs at migrated e2e
* test: fold plugin lifecycle probe into qa e2e
* test: use shared temp dirs in plugin lifecycle probe
* test: fold plugin lifecycle sweep into qa lab
* test: trim lifecycle docker text assertions
* test: keep followup script conversions split
* test: make lifecycle docker runner script-safe
* test: update changed helper routing expectation
2026-06-17 14:22:04 -07:00
Dallin Romney
450060d7a2
test(qa): expand smoke-ci and release categories and coverage ( #93175 )
...
* test(qa): add smoke ci primary coverage evidence
* test(qa): remove overstated primary coverage claims
* test(qa): make release profile include smoke ci
* test(qa): trim taxonomy formatting churn
* test(qa): avoid hardcoded profile names in coverage test
* test(qa): make release profile cover taxonomy
* test(qa): type profile fixture all category flag
* test(qa): include channel delivery in smoke ci profile
2026-06-15 18:05:52 -07:00
Dallin Romney
fef8394079
Convert QA scenarios to YAML files ( #92915 )
...
* refactor: load QA scenarios from YAML
* docs: update personal QA scenario docs
* test: keep QA scenarios YAML-only
2026-06-14 17:31:18 -07:00
NVIDIAN
ecaebfc51b
fix(agents): retry thinking-only errored turns ( #92191 )
...
Retry replay-safe reasoning-only provider errors before assistant failover while preserving classified fallback and terminal-output ownership. Adds deterministic Anthropic gateway fault-injection coverage and focused regression tests.\n\nCo-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com >
2026-06-14 09:39:27 -07:00
Dallin Romney
a3e9dfee0e
Simplify QA scorecard mapping shape ( #92558 )
...
* test(qa): simplify scorecard mapping shape
* test(qa): use typed scorecard evidence refs
* test(qa): map scorecard categories by coverage id
* feat: align qa coverage with taxonomy features
* refactor: keep qa coverage ids canonical
* refactor: minimize qa coverage id churn
* test: align qa coverage id assertions
* test: update qa evidence coverage expectations
* refactor qa taxonomy coverage ids
* style qa taxonomy coverage ids
* test qa coverage lint fix
* test qa coverage type fix
2026-06-14 00:16:33 -07:00
Dallin Romney
561b293c7a
Run Vitest and Playwright scenarios from qa suite ( #92606 )
...
* test(qa): run vitest and playwright scenarios from qa suite
* fix(qa): harden scenario suite dispatch
* refactor(qa): share scenario path utilities
* refactor(qa): share test file scenario runner
* refactor(qa): route test file scenarios through suite runtime
* refactor(qa): use explicit suite runtime result kind
* test(qa): write suite evidence artifact
* refactor(qa): clarify suite execution dispatch
* fix(qa): keep test-file scenarios out of flow-only runners
* refactor(qa): export mixed scenario suite runner
2026-06-13 01:06:10 -07:00
Vincent Koc
7d3e8dc963
test(qa): restore memory fallback config safely
2026-06-10 18:03:15 +09:00
Vincent Koc
2c146261a2
fix(qa): accept completed mock image turns
2026-06-10 17:48:33 +09:00
Vincent Koc
87abb8defb
fix(qa): preserve Matrix recovery state in sqlite
2026-06-10 17:35:41 +09:00
Vincent Koc
3b180d5d99
fix(qa): wait for restart wake before capability check
2026-06-10 17:09:18 +09:00
brokemac79
de4b8d8ebf
feat(plugins): allow installed trusted policy contracts
...
Allow explicitly enabled installed plugins to register declared trusted tool policies and agent tool result middleware, with trusted policy ids scoped by plugin owner.\n\nVerification covered targeted plugin/agent tests, typecheck, build, lint, local autoreview, and a Blacksmith Testbox runtime proof (tbx_01ktr1nq0rhq47fjkwrepm7fd3).
2026-06-10 16:18:23 +10:00
Vincent Koc
c350c35fad
fix(release): allow QA capability restore patch
...
(cherry picked from commit db711701d2 )
2026-06-10 08:27:59 +09:00
Vincent Koc
50130d32a9
test(release): align qa tool coverage gate
2026-06-09 01:02:24 +02:00
Vincent Koc
c7b01cf201
test(release): stabilize qa runtime parity gate
2026-06-09 01:02:24 +02:00
Peter Steinberger
b8d08f0cfd
docs: document repository scripts
2026-06-04 20:52:50 -04:00
Vincent Koc
a9f099d279
test(qa): require channel scenario markers
2026-06-03 14:27:25 +02:00
Bek
bce3d5bf92
trace: Correlate channel diagnostics into one trace
...
Correlates channel receive, agent lifecycle, model attempt diagnostics, and outbound delivery diagnostics into one trace waterfall so channel message runs can be inspected end-to-end.
Maintainer follow-up removed the internal `AgentHarnessV2` adapter surface and kept the harness path canonical through `src/agents/harness/lifecycle.ts`.
Proof:
- PR checks passed on `04e9189c15480d53663d533a04c9883164b4dd54`.
- `node scripts/run-vitest.mjs src/agents/harness/lifecycle.test.ts src/agents/harness/selection.test.ts src/channels/turn/kernel.test.ts`
- `pnpm check:changed` Testbox `tbx_01kt3xtrm70qc7nb90cqv5rah1`
Thanks @bek91.
Co-authored-by: Bek <bek.akhmedov@gmail.com >
2026-06-02 06:38:00 -04:00
Vincent Koc
4550cfa6a7
fix(qa): run plugin MCP probes from repo root
2026-06-01 07:13:24 +02:00
Peter Steinberger
b653d94918
chore(lint): enable no-useless-assignment
2026-05-31 22:40:48 +01:00
yaoyi1222
75e0053cf9
fix(auto-reply): warn on substantive private message-tool finals
...
Warn operators when message_tool_only produces unusually substantive private final text without a delivered source reply. Keeps short/NO_REPLY silence quiet, avoids logging response bodies, and distinguishes unrelated side effects from source-reply delivery.
2026-05-31 14:35:58 +01:00
Peter Steinberger
4c33aaa86c
refactor: unify OpenAI provider identity ( #88451 )
...
* refactor: unify OpenAI provider identity
* refactor: move legacy oauth sidecar doctor helpers
* test: align OpenAI fixtures after rebase
* test: clean OpenAI provider unification
* fix: finish OpenAI provider cleanup
* fix: finish OpenAI cleanup follow-through
* fix: finish OpenAI CI cleanup
2026-05-31 00:29:44 +01:00
Shakker
308fdbe7fb
refactor: remove skill workshop plugin package
2026-05-30 20:04:52 +01:00
Peter Steinberger
1188aa3b81
feat: add Claude Opus 4.8 support ( #87890 )
...
* feat: add Claude Opus 4.8 support
* fix: omit Vertex Opus sampling overrides
* fix: preserve Opus adaptive thinking levels
* fix: clamp Anthropic max effort support
* fix: use sha256 for QA mock call ids
* fix: type Anthropic transport test model metadata
* test: update PDF model default for Opus 4.8
2026-05-29 06:10:42 +01:00
Ramrajprabu
f3cfd752d3
feat(copilot): add GitHub Copilot agent runtime
...
Adds the opt-in bundled GitHub Copilot agent runtime, pinned SDK install path, docs/inventory, SDK/tool/sandbox/auth wiring, and replay/tool-safety fixes.
Verification:
- Local: git diff --check; fnm exec --using 24.15.0 pnpm tsgo:extensions; fnm exec --using 24.15.0 pnpm check:test-types; fnm exec --using 24.15.0 pnpm build.
- Autoreview local: clean for the replay-safety fix; branch autoreview engine returned empty output twice, so local autoreview plus local/Crabbox/CI proof was used.
- Crabbox focused Copilot: run_2c0db9f48a4a, 19 files / 485 tests passed.
- Crabbox additional boundary shard: run_26a246a1aa24, prompt snapshots and plugin SDK boundary/export checks passed.
- Crabbox live Copilot: run_d128e4048b4e, real gpt-4.1 turn with live_echo phase-1-green and clean session-file check.
- GitHub checks: green on head 7cc8657e0d , including Dependency Guard after exact-head approval.
Co-authored-by: Ramraj Balasubramanian <ramrajba@microsoft.com >
2026-05-29 05:15:22 +01:00
Peter Steinberger
bb46b79d3c
refactor: internalize OpenClaw agent runtime ( #85341 )
...
* refactor: extract agent core package
Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.
* refactor: extract shared llm runtime
Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.
* refactor: remove pi runtime internals
Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.
* refactor: tighten agent session runtime
Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.
* refactor: remove static model and pi auth paths
Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.
* refactor: remove legacy provider compat paths
* docs: remove agent parity notes
* fix: skip provider wildcard metadata parsing
* refactor: share session extension sdk loading
* refactor: inline acpx proxy error formatter
* refactor: fold edit recovery into edit tool
* fix: accept extension batch separator
* test: align startup provider plugin expectations
* fix: restore provider-scoped release discovery
* test: align static asset packaging expectations
* fix: run static provider catalogs during scoped discovery
* fix: add provider entry catalogs for scoped live discovery
* fix: load lightweight provider catalog entries
* fix: refresh provider-scoped plugin metadata
* fix: keep provider catalog entries on release live path
* fix: keep static manifest models in release live checks
* fix: harden release model discovery
* fix: reduce OpenAI live cache probe reasoning
* fix: disable OpenAI cache probe reasoning
* ci: extend OpenAI gateway live timeout
* fix: extend live gateway model budget
* fix: stabilize release validation regressions
* fix: honor provider aliases in model rows
* fix: stabilize release validation lanes
* fix: stabilize release memory qa
* ci: stabilize release validation lanes
* ci: prefer ipv4 for live docker node calls
* fix: restore shared tool-call stream wrapper
* ci: remove legacy pi test shard alias
* fix: clean up embedded agent test drift
* fix: stabilize runtime alias status
* fix: clean up embedded agent ci drift
* fix: restore release ci invariants
* fix: clean up post-rebase runtime drift
* fix: restore release ci checks
* fix: restore release ci after rebase
* fix: remove stale pi runtime path
* test: align compaction runtime expectations
* test: update plugin prerelease expectations
* fix: handle claude live tool approvals
* fix: stabilize release validation gates
* fix: finish agent runtime import
* test: finish post-rebase agent runtime mocks
* fix: keep codex compaction native
* fix: stabilize codex app-server hook tests
* test: isolate codex diagnostic active run
* test: remove codex diagnostic completion race
# Conflicts:
# extensions/codex/src/app-server/run-attempt.test.ts
* ci: fix full release manifest performance run id
* refactor: narrow llm plugin sdk boundary
* chore: drop generated google boundary stamps
* fix: repair rebase fallout
* fix: clean up rebased runtime references
* fix: decode codex jwt payloads as base64url
* fix: preserve shipped pi runtime alias
* fix: add scoped sdk virtual modules
* fix: decode llm codex oauth jwt as base64url
* fix: avoid stale vertex adc negative cache
* fix: harden tool arg decoding and codeql path
* fix: keep vertex adc negative checks live
* refactor: consolidate codex jwt and edit helpers
* fix: await codex oauth node runtime imports
* fix: preserve sdk tool and notice contracts
* fix: preserve shipped compat config boundaries
* fix: align codex oauth callback host
* fix: terminate agent-core loop streams on failure
* fix: keep codex oauth callback alive during fallback
* ci: include session tools in critical codeql scans
* fix: keep Cloudflare Anthropic provider auth header
* docs: redirect legacy pi runtime pages
* fix: honor bundled web provider compat discovery
* fix: protect session output spill files
* fix: keep legacy agent dir env blocked
* fix: contain auto-discovered skill symlinks
* fix: harden agent core sdk proxy surfaces
* fix: restore approval reaction sdk compat
* fix: keep live docker runs bounded
* fix: keep codex oauth redirect host aligned
* fix: resolve post-rebase agent runtime drift
* fix: redact anthropic oauth parse failures
* fix: preserve responses strict tool shaping
* fix: repair agent runtime rebase cleanup
* docs: redirect retired parity pages
* fix: bound auto-discovered resources to roots
* fix: repair post-rebase agent test drift
* fix: preserve bundled provider allowlist migration
* fix: preserve manifest-owned provider aliases
* fix: declare photon image dependency
* fix: keep provider headers out of proxy body
* fix: preserve shipped env aliases
* fix: refresh control ui i18n generated state
* fix: quote read fallback paths
* fix: preview edits through configured backend
* test: satisfy core test typecheck
* fix: preserve ZAI usage auth fallback
* test: repair codex diagnostic test
* fix: repair agent runtime rebase drift
* test: finish embedded runner import rename
* fix: repair agent runtime rebase integrations
* test: align compaction oauth fallback expectations
* fix: allow sdk-auth session models
* fix: update doctor tool schema import
* fix: preserve bedrock plugin region
* fix: stream harmony-like prose immediately
* ci: include session runtime in codeql shards
* fix: repair latest rebase integrations
* fix: honor explicit codex websocket transport
* fix: keep openai-compatible credentials provider-scoped
* fix: refresh sdk api baseline after rebase
* fix: route cli runtime aliases through openclaw harness
* test: rename stale harness mock expectation
* test: rename embedded agent overflow calls
* test: clean embedded auth test wording
* test: use openclaw stream types in deepinfra cache test
* fix: refresh sdk api baseline on latest main
* fix: honor bundled discovery compat allowlists
* fix: refresh sdk api baseline after latest rebase
* fix: remove stale rebase imports
* test: rename stale model catalog mock
* test: mock renamed doctor runtime modules
* fix: map canonical kimi env auth
* fix: use internal model registry in bench script
* fix: migrate deepinfra provider catalog entry
* fix: enforce builtin tool suppression
* fix: route compaction auth and proxy payloads safely
* refactor: prune unused llm registry leftovers
* test: update codex hooks session import
* test: fix model picker ci coverage
* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00
Vincent Koc
14198a1c66
fix(qa): close remaining mock qa e2e regressions
2026-05-27 10:06:08 +02:00