Peter Steinberger
|
0c278bb93c
|
refactor: break runtime import cycles
|
2026-04-09 03:56:22 +01:00 |
|
Peter Steinberger
|
be46d0ddc6
|
test: update character eval public panel
|
2026-04-09 01:25:59 +01:00 |
|
Peter Steinberger
|
39cc6b7dc7
|
fix: stabilize character eval and Qwen model routing
|
2026-04-09 01:04:09 +01:00 |
|
Shakker
|
d66e2d5b33
|
test: cover curated qa missing-key reply classification
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
c63d25bd9b
|
fix: classify curated qa missing-key replies
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
9cfa152962
|
test: cover mixed-traffic qa wait cursors
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
204d766b27
|
fix: align qa wait cursor semantics
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
a6d76df4f0
|
test: cover qa scenario wait failure replies
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
b3f3cfd598
|
fix: fail fast across qa scenario wait paths
|
2026-04-08 21:55:39 +01:00 |
|
Shakker
|
491e216c45
|
fix: fail fast on qa live auth errors
|
2026-04-08 21:55:39 +01:00 |
|
Peter Steinberger
|
21ef1bf8de
|
feat: parallelize character eval runs
|
2026-04-08 20:05:55 +01:00 |
|
Peter Steinberger
|
f1e75d3259
|
fix: load QA live provider overrides
|
2026-04-08 20:05:55 +01:00 |
|
Peter Steinberger
|
a3d21539ef
|
test: stabilize full-suite execution
|
2026-04-08 19:40:57 +01:00 |
|
Peter Steinberger
|
4a51a1031d
|
feat: add character eval model options
|
2026-04-08 17:05:30 +01:00 |
|
Peter Steinberger
|
4bbf78e566
|
test: make character eval scenario natural
|
2026-04-08 17:05:30 +01:00 |
|
Peter Steinberger
|
3101d81053
|
feat: add QA character eval reports
|
2026-04-08 15:52:49 +01:00 |
|
Peter Steinberger
|
aa3b1357cb
|
fix: support Codex CLI QA auth
|
2026-04-08 15:52:01 +01:00 |
|
Peter Steinberger
|
97dfbe0fe1
|
feat: add qa character vibes eval
|
2026-04-08 12:05:24 +01:00 |
|
Peter Steinberger
|
9bf3482470
|
refactor: finish markdown-only qa runner
|
2026-04-08 11:56:02 +01:00 |
|
Peter Steinberger
|
95e397a266
|
refactor: dedupe repeated test helpers
|
2026-04-08 09:58:22 +01:00 |
|
Peter Steinberger
|
492e98a88a
|
refactor: move qa suite logic into scenario markdown
|
2026-04-08 09:13:57 +01:00 |
|
Vincent Koc
|
be530f085d
|
refactor(plugin-sdk): share tool payload extraction
|
2026-04-08 09:07:28 +01:00 |
|
Vincent Koc
|
4260ac4cf6
|
perf(plugins): narrow boundary compile sdk imports
|
2026-04-08 08:52:51 +01:00 |
|
Peter Steinberger
|
21d9bac5ec
|
fix: stabilize live qa scenario suite
|
2026-04-08 08:17:59 +01:00 |
|
Peter Steinberger
|
b73d8ef7d7
|
refactor: split qa scenarios into per-file markdown defs
|
2026-04-08 05:37:17 +01:00 |
|
Peter Steinberger
|
c0aed59fca
|
refactor: move qa suite definitions into markdown
|
2026-04-07 23:39:50 +01:00 |
|
Peter Steinberger
|
e0ad3e79e6
|
refactor: dedupe normalization lowercase helpers
|
2026-04-07 22:57:52 +01:00 |
|
Peter Steinberger
|
83bdba2bae
|
fix: resolve rebase regressions for ci landing
|
2026-04-07 21:02:04 +01:00 |
|
Peter Steinberger
|
5b090561fb
|
refactor: dedupe browser whatsapp qa lowercase helpers
|
2026-04-07 20:58:01 +01:00 |
|
Peter Steinberger
|
a00b01f5ed
|
fix: harden complex qa suite scenarios
|
2026-04-07 20:35:39 +01:00 |
|
Peter Steinberger
|
b5d2bd6f41
|
fix(qa): tighten frontier scope evals
|
2026-04-07 20:32:42 +01:00 |
|
Peter Steinberger
|
4e69a9b329
|
fix(qa): restore safe no-fork gateway runtime
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
cde12e63e7
|
perf(qa): lazy-load runner catalog for lab ui
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
f312d6c106
|
fix(qa): preserve gateway cli auth in no-fork rpc path
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
e7538b4499
|
perf(qa): drop per-rpc gateway cli forks
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
02bd9e8c10
|
perf(qa): trim frontier direct-agent waits
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
35eb70f1f5
|
test(qa): retry flaky local fetches in lab server tests
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
986536ff6b
|
fix(qa): keep direct self-check outputs under repo root
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
f6544a0a3b
|
fix(qa): anchor runner artifacts to repo root
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
daeff2fa89
|
fix(qa): default docker artifacts from repo root
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
76bde3d42b
|
fix(qa): support neutral-cwd docker commands
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
816a3eae8a
|
chore(qa): align qa cli provider input types
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
5aa4fd3216
|
fix(qa): normalize qa cli lane inputs
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
7d18b145f8
|
fix(qa): keep manual alternate model aligned
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
cdf18c16b4
|
fix(qa): default manual lanes by provider mode
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
3182588ad4
|
fix(qa): allow random qa-lab control-ui origins
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
82535771cd
|
fix(qa): pin gateway child control ui root
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
f9f38a48e6
|
fix(qa): align mock model-switch continuity
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
9a106f7e3c
|
fix(qa): support neutral-cwd suite runs
|
2026-04-07 20:32:42 +01:00 |
|
Vincent Koc
|
f93b217834
|
feat(qa): add manual harness lane
|
2026-04-07 20:32:42 +01:00 |
|