Commit Graph

794 Commits

Author SHA1 Message Date
Vincent Koc
c6aa8e7423 refactor(qa): remove unused parity helpers 2026-06-18 14:55:46 +08:00
Vincent Koc
d5d6576e06 fix(docs): refresh qa lab plugin inventory 2026-06-18 07:57:49 +02:00
Dallin Romney
e17d111990 fix: require all taxonomy coverage ids (#94296) 2026-06-17 16:38:14 -07:00
Colin Johnson
591313e80a qa-lab: support script-backed evidence scenarios (#94276)
* qa: add script scenario execution kind

* fix(qa-lab): carry suite profile into script producer evidence and simplify artifact path resolution

* fix(qa-lab): keep out-of-repo producer artifacts absolute to avoid ../ traversal refs

---------

Co-authored-by: Dallin Romney <dallinromney@gmail.com>
2026-06-17 15:09:25 -07:00
Vincent Koc
8288b4d4c9 fix(qa-lab): stabilize web search parity fixture 2026-06-18 00:07:36 +02:00
Dallin Romney
f7e5132ffd test: fold gateway smoke into qa e2e (#93178) 2026-06-17 14:55:28 -07:00
Dallin Romney
fae4a01d0d test: fold otel smoke into qa e2e (#93181)
* test: fold otel smoke into qa e2e

* test: eliminate otel smoke script
2026-06-17 14:54:58 -07:00
Dallin Romney
0a6736af09 test: fold lifecycle and package proof into QA Lab (#93114)
* test: fold script coverage into qa scenarios

* test: migrate script checks into qa e2e

* test: point qa code refs at migrated e2e

* test: fold plugin lifecycle probe into qa e2e

* test: use shared temp dirs in plugin lifecycle probe

* test: fold plugin lifecycle sweep into qa lab

* test: trim lifecycle docker text assertions

* test: keep followup script conversions split

* test: make lifecycle docker runner script-safe

* test: update changed helper routing expectation
2026-06-17 14:22:04 -07:00
Vincent Koc
5061a7a741 fix(qa-lab): kill qa cli process groups 2026-06-17 21:59:53 +02:00
Vincent Koc
aa3ed8f7ac test(qa-lab): use temp dir harness in model catalog test 2026-06-17 21:57:00 +02:00
Vincent Koc
d988851fe0 fix(qa-lab): wait for model catalog process groups 2026-06-17 21:42:00 +02:00
Vincent Koc
aa498cfe11 fix(qa-lab): wait for gateway child process groups 2026-06-17 18:51:49 +02:00
Vincent Koc
1ee2733b2f fix(qa-lab): harden live cleanup and readiness 2026-06-17 18:40:21 +02:00
Vincent Koc
2195b446d4 test(qa): align Docker inspect expectations 2026-06-17 15:03:49 +02:00
Vincent Koc
c05acc7a14 fix(testing): bind QA docker port probes to loopback 2026-06-17 14:18:03 +02:00
Vincent Koc
e2b6753b87 fix(qa-lab): bound credential payload reads 2026-06-17 11:59:55 +02:00
Vincent Koc
6774e7f259 chore(release): sync main to 2026.6.8 2026-06-17 07:25:30 +08:00
Shakker
920e6a8eec chore: set version 2026.6.9 2026-06-16 19:54:07 +01:00
Vincent Koc
0c657190ec fix(qa): fail runtime parity on cell failures 2026-06-16 05:53:19 +02:00
Vincent Koc
ffb67d2d2e fix(qa): suppress empty WhatsApp debug artifacts
Suppress empty WhatsApp gateway-debug artifact publication and keep the public QA run view redacted and consistent across report/evidence output.

Verification:
- Testbox focused WhatsApp QA runtime format/lint/test run passed: https://github.com/openclaw/openclaw/actions/runs/27589031659
- Testbox changed gate passed: https://github.com/openclaw/openclaw/actions/runs/27589128132
- PR CI passed on final head: https://github.com/openclaw/openclaw/actions/runs/27589903708
- git diff --check passed locally
2026-06-16 10:32:53 +08:00
Vincent Koc
3c65127827 fix(qa): preserve WhatsApp live failure diagnostics 2026-06-16 03:42:26 +02:00
Dallin Romney
450060d7a2 test(qa): expand smoke-ci and release categories and coverage (#93175)
* test(qa): add smoke ci primary coverage evidence

* test(qa): remove overstated primary coverage claims

* test(qa): make release profile include smoke ci

* test(qa): trim taxonomy formatting churn

* test(qa): avoid hardcoded profile names in coverage test

* test(qa): make release profile cover taxonomy

* test(qa): type profile fixture all category flag

* test(qa): include channel delivery in smoke ci profile
2026-06-15 18:05:52 -07:00
Vincent Koc
c32ba171db fix(qa): fail unsuccessful self-checks 2026-06-16 01:32:47 +02:00
Dallin Romney
e32929e12c Add slim evidence mode for QA profile evidence (#93179)
* test(qa): compact profile evidence execution metadata

* docs(qa): document compact profile evidence

* test(qa): support compact evidence mode

* test(qa): rename compact evidence mode to slim

* docs(qa): trim slim evidence wording

* fix(qa): avoid commander runtime import
2026-06-15 14:50:40 -07:00
Dallin Romney
1f8c4d3958 simplify QA evidence profile and mappings/coverage shape (#93153)
* test(qa): simplify evidence coverage shape

* test(qa): collapse evidence scorecard metadata

* test(qa): document evidence schema version
2026-06-14 22:26:58 -07:00
Dallin Romney
3d38c9a633 test(qa): embed profile scorecard evidence (#93109)
* test(qa): embed profile scorecard evidence

* test(qa): fix profile runner return lint

* test(qa): satisfy suite command lint return
2026-06-14 20:51:38 -07:00
Dallin Romney
e8db9c3bc0 test(qa): add qa run --profile and unified output summary/evidence (#91587)
* test(qa): add mapped qa run profiles

* test(qa): document mapped profile runner

* test(qa): validate run profiles from mapping

* test(qa): preserve root profile parsing

* test(qa): simplify taxonomy profile dispatch

* test(qa): align tool coverage CLI expectation

* test(qa): fix profile dispatch fixture type

* test(qa): share profile runner option types

* test(qa): split shared cli runner options

* test(qa): unify profile suite artifacts

* fix(qa): filter profile scenarios by provider lane

* test(qa): drop native scenario subreports

* fix(qa): keep native log refs repo-relative

* fix(cli): preserve qa run root profile parsing

* fix(qa): avoid qa profile flag collision

* fix(qa): reject profile flags without qa profile
2026-06-14 18:08:42 -07:00
Dallin Romney
fef8394079 Convert QA scenarios to YAML files (#92915)
* refactor: load QA scenarios from YAML

* docs: update personal QA scenario docs

* test: keep QA scenarios YAML-only
2026-06-14 17:31:18 -07:00
NVIDIAN
ecaebfc51b fix(agents): retry thinking-only errored turns (#92191)
Retry replay-safe reasoning-only provider errors before assistant failover while preserving classified fallback and terminal-output ownership. Adds deterministic Anthropic gateway fault-injection coverage and focused regression tests.\n\nCo-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com>
2026-06-14 09:39:27 -07:00
Dallin Romney
1affe4fcdf Fold Telegram RTT sampling into live QA evidence (#92550)
* refactor(qa): fold telegram rtt into live evidence

* test: default package telegram rtt samples

* refactor(qa-lab): fold telegram rtt into live evidence

* fix(qa-lab): keep package telegram rtt optional for focused runs

* fix(qa-lab): avoid stale rtt evidence on failed samples

* fix(qa-lab): pass telegram live env into credential leasing

* fix(qa-lab): update telegram canary remediation artifacts

* docs(qa): remove stale telegram observed artifact guidance

* fix(qa-lab): clarify telegram empty-reply remediation

* fix(qa-lab): honor telegram rtt timeout

* ci(qa): drop stale telegram capture env

* refactor: align telegram evidence coverage fields

* fix: ignore stale telegram observed artifacts

* fix: preserve telegram rtt coverage mapping

* fix: omit unused telegram rtt catch binding

* docs: document telegram rtt check selector
2026-06-14 17:02:33 +08:00
Vincent Koc
b1caba5906 test(qa): align tool coverage CLI expectation 2026-06-14 16:11:00 +08:00
Dallin Romney
a3e9dfee0e Simplify QA scorecard mapping shape (#92558)
* test(qa): simplify scorecard mapping shape

* test(qa): use typed scorecard evidence refs

* test(qa): map scorecard categories by coverage id

* feat: align qa coverage with taxonomy features

* refactor: keep qa coverage ids canonical

* refactor: minimize qa coverage id churn

* test: align qa coverage id assertions

* test: update qa evidence coverage expectations

* refactor qa taxonomy coverage ids

* style qa taxonomy coverage ids

* test qa coverage lint fix

* test qa coverage type fix
2026-06-14 00:16:33 -07:00
Vincent Koc
47759c3506 fix(qa): accept rich Telegram canary presence
(cherry picked from commit e86eb7567a)
2026-06-14 07:20:16 +08:00
Vincent Koc
924f4c1964 fix(qa): read rich Telegram replies in live checks
(cherry picked from commit 9c8b880353)
2026-06-14 06:35:59 +08:00
Vincent Koc
4208c89ec4 test(qa-lab): align bootstrap selection assertion 2026-06-13 18:11:46 +08:00
Dallin Romney
561b293c7a Run Vitest and Playwright scenarios from qa suite (#92606)
* test(qa): run vitest and playwright scenarios from qa suite

* fix(qa): harden scenario suite dispatch

* refactor(qa): share scenario path utilities

* refactor(qa): share test file scenario runner

* refactor(qa): route test file scenarios through suite runtime

* refactor(qa): use explicit suite runtime result kind

* test(qa): write suite evidence artifact

* refactor(qa): clarify suite execution dispatch

* fix(qa): keep test-file scenarios out of flow-only runners

* refactor(qa): export mixed scenario suite runner
2026-06-13 01:06:10 -07:00
Dallin Romney
d8b3e523ff Add QA scorecard taxonomy validation (#91500)
Merged via squash.

Prepared head SHA: a9aec907d4
Co-authored-by: RomneyDa <6581799+RomneyDa@users.noreply.github.com>
Co-authored-by: RomneyDa <6581799+RomneyDa@users.noreply.github.com>
Reviewed-by: @RomneyDa
2026-06-12 17:07:51 -07:00
Dallin Romney
4809ac70fa Add QA evidence artifact output (#91484)
* feat: add qa evidence summary normalization

* chore: rename qa evidence target environment

* chore: align qa evidence profile terminology

* chore: align qa evidence summary fields

* chore: add qa evidence taxonomy ref

* test: remove stale multipass evidence example

* test(qa): normalize vitest and playwright evidence

* test(qa): slim evidence summary metadata

* test(qa): clarify evidence summary inputs

* test(qa): rename scenario specs in evidence flow

* test(qa): treat evidence profiles as mapping strings

* test(qa): use neutral evidence test identity

* test(qa): nest evidence summary joins

* refactor(qa): normalize live evidence summaries

* fix(qa): accept normalized telegram rtt summaries

* fix(qa): normalize evidence lane summaries

* fix(qa): align evidence summaries with requirements

* refactor(qa): tighten evidence summary builders

* refactor(qa): restore standard evidence ids

* fix(qa): keep legacy summaries out of rtt evidence

* refactor(qa): make package evidence provenance explicit

* test(qa): keep script tests out of qa lab internals

* refactor(qa): rename scenario evidence definitions

* refactor(qa): clean evidence summary wording

* test(qa): fix evidence summary test inputs

* refactor(qa): simplify evidence identity fields

* refactor(qa): tighten evidence summary inputs

* refactor(qa): rename evidence artifact
2026-06-12 16:12:58 -07:00
Jesse Merhi
6223a538bc fix(docker): bundle QA Lab runtime in the image (#92087)
* fix(docker): split qa lab runtime fixes

* fix(docker): remove store platform selector

* test(docker): assert qa lab ui copy is gated
2026-06-12 14:02:32 +10:00
Vincent Koc
8042ec4cb8 fix(qa): scope runtime parity mock requests 2026-06-11 05:19:10 +09:00
Vincent Koc
7d3e8dc963 test(qa): restore memory fallback config safely 2026-06-10 18:03:15 +09:00
Vincent Koc
7f1d82ab25 revert(sessions): defer session metadata sqlite
Reverts 538d36eaaa while preserving subsequent main changes. The beta-only SQLite downgrade rescue and reverse migration remain excluded.
2026-06-10 16:34:06 +09:00
brokemac79
de4b8d8ebf feat(plugins): allow installed trusted policy contracts
Allow explicitly enabled installed plugins to register declared trusted tool policies and agent tool result middleware, with trusted policy ids scoped by plugin owner.\n\nVerification covered targeted plugin/agent tests, typecheck, build, lint, local autoreview, and a Blacksmith Testbox runtime proof (tbx_01ktr1nq0rhq47fjkwrepm7fd3).
2026-06-10 16:18:23 +10:00
Josh Avant
9f48254f09 Fix config.patch explicit array replacement (#91551)
* fix config patch explicit array replacement

* fix generated config patch protocol model

* fix config patch test helper typing

* fix shared auth patch replacement tests

* update config patch prompt snapshots

* harden qa lab config patch replace paths
2026-06-08 21:48:46 -05:00
Vincent Koc
50130d32a9 test(release): align qa tool coverage gate 2026-06-09 01:02:24 +02:00
Vincent Koc
c7b01cf201 test(release): stabilize qa runtime parity gate 2026-06-09 01:02:24 +02:00
Vincent Koc
1019b591d5 test(release): stabilize qa gateway restart readiness 2026-06-09 01:02:24 +02:00
Vincent Koc
505b23a137 fix(release): clear beta validation blockers 2026-06-09 01:02:22 +02:00
Peter Steinberger
538d36eaaa refactor: move session metadata to SQLite (#91322)
* refactor: move session metadata to sqlite

* test: seed session stores with sqlite fixtures

* test: seed remaining session stores with sqlite fixtures

* fix: stabilize sqlite session cache freshness

* test: seed cli transcript metadata in sqlite
2026-06-07 23:17:35 -07:00
Marcus Castro
181238fb53 feat(whatsapp): expand live QA coverage (#90480)
* feat(whatsapp): expand qa driver message support

* feat(qa-lab): add deterministic whatsapp mock replies

* feat(qa-lab): expand whatsapp live qa scenarios

* docs(qa): document whatsapp live qa coverage
2026-06-08 00:03:23 -03:00