From 006ebe692d7bef03579c4fca38f4615e600edcdb Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sun, 17 May 2026 13:15:30 +0100 Subject: [PATCH] chore: rename codex review skill to autoreview --- .agents/skills/autoreview/SKILL.md | 128 +++++ .agents/skills/autoreview/scripts/autoreview | 533 +++++++++++++++++++ AGENTS.md | 2 +- CHANGELOG.md | 1 + 4 files changed, 663 insertions(+), 1 deletion(-) create mode 100644 .agents/skills/autoreview/SKILL.md create mode 100755 .agents/skills/autoreview/scripts/autoreview diff --git a/.agents/skills/autoreview/SKILL.md b/.agents/skills/autoreview/SKILL.md new file mode 100644 index 00000000000..43b61ac3cc9 --- /dev/null +++ b/.agents/skills/autoreview/SKILL.md @@ -0,0 +1,128 @@ +--- +name: autoreview +description: "Autoreview closeout: local dirty changes, PR branch vs main, parallel tests." +--- + +# Autoreview + +Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing. + +Use when: +- user asks for Codex review / autoreview / second-model review +- after non-trivial code edits, before final/commit/ship +- reviewing a local branch or PR branch after fixes + +## Contract + +- Treat review output as advisory. Never blindly apply it. +- Verify every finding by reading the real code path and adjacent files. +- Read dependency docs/source/types when the finding depends on external behavior. +- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase. +- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class. +- Keep going until the selected review path returns no accepted/actionable findings. +- If a review-triggered fix changes code, rerun focused tests and rerun the review helper. +- Default to Codex review. If Codex is unavailable or exits with an error, the helper may fall back to `claude -p`; `pi -p` and `opencode run` are explicit reviewer/fallback options. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior. +- Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording. +- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse. +- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know. +- Do not push just to review. Push only when the user requested push/ship/PR update. + +## Pick Target + +Dirty local work: + +```bash +codex review --uncommitted +``` + +Use this only when the patch is actually unstaged/staged/untracked in the +current checkout. For committed, pushed, or PR work, point Codex at the commit +or branch diff instead; do not force `--mode local` / `--uncommitted` just +because the helper docs mention dirty work first. A clean `--uncommitted` review +only proves there is no local patch. + +Branch/PR work: + +```bash +git fetch origin +codex review --base origin/main +``` + +Do not pass an inline prompt with `--base`; current CLI rejects `--base` + `[PROMPT]` even though help text is ambiguous. If custom instructions are needed, run the plain base review first, then do a local/manual follow-up pass. + +If an open PR exists, use its actual base: + +```bash +base=$(gh pr view --json baseRefName --jq .baseRefName) +codex review --base "origin/$base" +``` + +Committed single change: + +```bash +codex review --commit HEAD +``` + +or with the helper: + +```bash +.agents/skills/autoreview/scripts/autoreview --mode commit --commit HEAD +``` + +Use commit review for already-landed or already-pushed work on `main`. Reviewing +clean `main` against `origin/main` is usually an empty diff after push. For a +small stack, review each commit explicitly or review the branch before merging +with `--base`. + +## Parallel Closeout + +Format first if formatting can change line locations. Then it is OK to run tests and review in parallel: + +```bash +.agents/skills/autoreview/scripts/autoreview --parallel-tests "" +``` + +Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation. + +## Context Efficiency + +Codex review is usually noisy. Default to a subagent filter when subagents are available. Ask it to run the review and return only: +- actionable findings it accepts +- findings it rejects, with one-line reason +- exact files/tests to rerun + +Run inline only for tiny changes or when subagents are unavailable. + +## Helper + +Bundled helper: + +```bash +.agents/skills/autoreview/scripts/autoreview --help +``` + +The helper: +- chooses dirty `--uncommitted` first +- otherwise uses current PR base if `gh pr view` works +- otherwise uses `origin/main` for non-main branches +- use `--mode commit --commit ` for already-committed work, especially clean `main` after landing +- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing +- supports `--reviewer codex|claude|pi|opencode|auto`; `auto` runs Codex first +- supports `--fallback-reviewer claude|pi|opencode|none`; default is `claude` +- falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings +- writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set +- supports `--dry-run`, `--parallel-tests`, and commit refs +- runs nested review with `--dangerously-bypass-approvals-and-sandbox` by default +- keeps accepting `--full-access`; use `--no-yolo` or `AUTOREVIEW_YOLO=0` to opt out +- still accepts legacy `CODEX_REVIEW_*` env vars when the matching `AUTOREVIEW_*` var is unset +- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0 + +## Final Report + +Include: +- review command used +- tests/proof run +- findings accepted/rejected, briefly why +- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected + +Do not run another Codex review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean. diff --git a/.agents/skills/autoreview/scripts/autoreview b/.agents/skills/autoreview/scripts/autoreview new file mode 100755 index 00000000000..c1e5ce603df --- /dev/null +++ b/.agents/skills/autoreview/scripts/autoreview @@ -0,0 +1,533 @@ +#!/usr/bin/env bash +set -euo pipefail + +usage() { + cat <<'EOF' +Usage: autoreview [options] + +Options: + --mode auto|local|branch|commit + Target selection. Default: auto. + --base REF Base ref for branch review. Default: PR base or origin/main. + --commit REF Commit ref for commit review. Default: HEAD. + --reviewer codex|claude|pi|opencode|auto + Review engine. Default: auto (Codex, fallback reviewer on error). + --fallback-reviewer claude|pi|opencode|none + Fallback when Codex is unavailable or exits nonzero. Default: claude. + --codex-bin PATH Codex binary. Default: codex. + --claude-bin PATH Claude binary. Default: claude. + --pi-bin PATH Pi binary. Default: pi. + --opencode-bin PATH OpenCode binary. Default: opencode. + --full-access Keep yolo/full-access mode enabled. Default. + --no-yolo Run nested Codex review with normal sandbox/approval prompts. + --output FILE Also save output to file. + --parallel-tests CMD Run review and test command concurrently. + --dry-run Print selected commands, do not run. + -h, --help Show help. + +Modes: + local codex review --uncommitted + branch codex review --base + commit codex review --commit + auto dirty tree -> local, else PR/current branch -> branch +EOF +} + +mode=auto +base_ref= +commit_ref=HEAD +reviewer=${AUTOREVIEW_REVIEWER:-${CODEX_REVIEW_REVIEWER:-auto}} +fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-claude}} +codex_bin=${CODEX_BIN:-codex} +claude_bin=${CLAUDE_BIN:-claude} +pi_bin=${PI_BIN:-pi} +opencode_bin=${OPENCODE_BIN:-opencode} +codex_args=() +yolo=${AUTOREVIEW_YOLO:-${CODEX_REVIEW_YOLO:-1}} +output=${AUTOREVIEW_OUTPUT:-${CODEX_REVIEW_OUTPUT:-}} +parallel_tests= +dry_run=false + +while [[ $# -gt 0 ]]; do + case "$1" in + --mode) + mode=${2:-} + shift 2 + ;; + --base) + base_ref=${2:-} + shift 2 + ;; + --commit) + commit_ref=${2:-} + shift 2 + ;; + --reviewer) + reviewer=${2:-} + shift 2 + ;; + --fallback-reviewer) + fallback_reviewer=${2:-} + shift 2 + ;; + --codex-bin) + codex_bin=${2:-} + shift 2 + ;; + --claude-bin) + claude_bin=${2:-} + shift 2 + ;; + --pi-bin) + pi_bin=${2:-} + shift 2 + ;; + --opencode-bin) + opencode_bin=${2:-} + shift 2 + ;; + --full-access) + yolo=1 + shift + ;; + --no-yolo) + yolo=0 + shift + ;; + --output) + output=${2:-} + shift 2 + ;; + --parallel-tests) + parallel_tests=${2:-} + shift 2 + ;; + --dry-run) + dry_run=true + shift + ;; + -h|--help) + usage + exit 0 + ;; + *) + usage >&2 + exit 2 + ;; + esac +done + +case "$yolo" in + 0|false|False|FALSE|no|No|NO|off|Off|OFF) ;; + *) codex_args+=(--dangerously-bypass-approvals-and-sandbox) ;; +esac + +case "$mode" in + auto|local|branch|commit) ;; + *) + echo "invalid --mode: $mode" >&2 + exit 2 + ;; +esac + +case "$reviewer" in + auto|codex|claude|pi|opencode) ;; + *) + echo "invalid --reviewer: $reviewer" >&2 + exit 2 + ;; +esac + +case "$fallback_reviewer" in + claude|pi|opencode|none) ;; + *) + echo "invalid --fallback-reviewer: $fallback_reviewer" >&2 + exit 2 + ;; +esac + +repo_root=$(git rev-parse --show-toplevel) + +current_branch=$(git branch --show-current 2>/dev/null || true) +dirty=false +if [[ -n "$(git status --porcelain)" ]]; then + dirty=true +fi + +pr_url= +if [[ -z "$base_ref" && "$mode" != local ]] && command -v gh >/dev/null 2>&1; then + if pr_lines=$(gh pr view --json baseRefName,url --jq '[.baseRefName, .url] | @tsv' 2>/dev/null); then + base_name=${pr_lines%%$'\t'*} + pr_url=${pr_lines#*$'\t'} + if [[ -n "$base_name" ]]; then + base_ref="origin/$base_name" + fi + fi +fi + +if [[ -z "$base_ref" ]]; then + base_ref=origin/main +fi + +review_kind= +if [[ "$mode" == local || ( "$mode" == auto && "$dirty" == true ) ]]; then + review_kind=local +elif [[ "$mode" == commit ]]; then + review_kind=commit +elif [[ "$mode" == branch || ( "$mode" == auto && -n "$current_branch" && "$current_branch" != "main" ) ]]; then + review_kind=branch +else + echo "no review target: clean main checkout and no forced mode" >&2 + exit 1 +fi + +if [[ "$review_kind" == local ]]; then + review_cmd=("$codex_bin" "${codex_args[@]}" review --uncommitted) +elif [[ "$review_kind" == commit ]]; then + review_cmd=("$codex_bin" "${codex_args[@]}" review --commit "$commit_ref") +else + review_cmd=("$codex_bin" "${codex_args[@]}" review --base "$base_ref") +fi + +printf 'autoreview target: %s\n' "$review_kind" +printf 'branch: %s\n' "${current_branch:-detached}" +if [[ -n "$pr_url" ]]; then + printf 'pr: %s\n' "$pr_url" +fi +printf 'reviewer: %s\n' "$reviewer" +if [[ "$reviewer" == auto ]]; then + printf 'fallback-reviewer: %s\n' "$fallback_reviewer" +fi +if [[ "$reviewer" == auto || "$reviewer" == codex ]]; then + printf 'review:' + printf ' %q' "${review_cmd[@]}" + printf '\n' +else + printf 'review: %s prompt review\n' "$reviewer" +fi +if [[ -n "$parallel_tests" ]]; then + printf 'tests: %s\n' "$parallel_tests" +fi +if [[ "$review_kind" == branch ]]; then + printf 'fetch: git fetch origin --quiet\n' +fi +if [[ -n "$output" ]]; then + printf 'output: %s\n' "$output" +fi + +if [[ "$dry_run" == true ]]; then + exit 0 +fi + +if [[ "$review_kind" == branch ]]; then + git fetch origin --quiet || { + echo "warning: git fetch origin failed; reviewing with existing refs" >&2 + } +fi + +review_output=$output +review_output_is_temp=false +if [[ -z "$review_output" ]]; then + review_output=$(mktemp) + review_output_is_temp=true +fi +mkdir -p "$(dirname "$review_output")" +: > "$review_output" + +cleanup() { + if [[ "${review_output_is_temp:-false}" == true && -n "${review_output:-}" ]]; then + rm -f "$review_output" + fi + if [[ -n "${prompt_file:-}" ]]; then + rm -f "$prompt_file" + fi +} +trap cleanup EXIT + +run_review() { + mkdir -p "$(dirname "$review_output")" + "${review_cmd[@]}" 2>&1 | tee "$review_output" +} + +diff_for_review() { + case "$review_kind" in + local) + git -C "$repo_root" diff --stat + git -C "$repo_root" diff --cached --stat + git -C "$repo_root" diff --find-renames + git -C "$repo_root" diff --cached --find-renames + while IFS= read -r untracked_file; do + [[ -n "$untracked_file" ]] || continue + git -C "$repo_root" diff --no-index -- /dev/null "$untracked_file" || true + done < <(git -C "$repo_root" ls-files --others --exclude-standard) + ;; + commit) + git -C "$repo_root" show --find-renames --stat --format=fuller "$commit_ref" + git -C "$repo_root" show --find-renames --format=medium "$commit_ref" + ;; + branch) + git -C "$repo_root" diff --find-renames --stat "$base_ref"...HEAD + git -C "$repo_root" diff --find-renames "$base_ref"...HEAD + ;; + esac +} + +build_prompt_file() { + prompt_file=$(mktemp) + { + cat <] Short title + File: path:line + Why: one sentence + Fix: one sentence +- If no accepted/actionable findings, output exactly: + autoreview clean: no accepted/actionable findings reported + +Diff: +EOF + diff_for_review + } > "$prompt_file" || return +} + +run_prompt_reviewer() { + local selected=$1 + local reviewer_output + local status=0 + + if ! build_prompt_file; then + rm -f "${prompt_file:-}" + prompt_file= + return 1 + fi + reviewer_output=$(mktemp) + mkdir -p "$(dirname "$review_output")" + + case "$selected" in + claude) + if ! command -v "$claude_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $claude_bin" >&2 + status=127 + elif printf 'fallback: claude -p\n' | tee -a "$review_output"; then + "$claude_bin" --tools "" --no-session-persistence -p < "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + pi) + if ! command -v "$pi_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $pi_bin" >&2 + status=127 + elif printf 'fallback: pi -p\n' | tee -a "$review_output"; then + "$pi_bin" --no-tools --no-session -p < "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + opencode) + if ! command -v "$opencode_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $opencode_bin" >&2 + status=127 + elif printf 'fallback: opencode run\n' | tee -a "$review_output"; then + "$opencode_bin" run --pure --dir "$(dirname "$prompt_file")" --file "$prompt_file" \ + "Review the attached prompt file. Do not modify files." 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + *) + echo "unsupported prompt reviewer: $selected" >&2 + status=2 + ;; + esac + if [[ "$status" == 0 ]]; then + if grep -Eq '\[P[0-3]\]' "$reviewer_output"; then + status=1 + elif ! grep -q '[^[:space:]]' "$reviewer_output"; then + status=1 + elif ! grep -Fxq 'autoreview clean: no accepted/actionable findings reported' "$reviewer_output"; then + status=1 + fi + fi + rm -f "$reviewer_output" + rm -f "$prompt_file" + prompt_file= + return "$status" +} + +run_selected_review() { + local selected=$1 + case "$selected" in + codex) + if ! command -v "$codex_bin" >/dev/null 2>&1; then + echo "codex reviewer unavailable: $codex_bin" >&2 + return 127 + fi + run_review + ;; + claude|pi|opencode) + run_prompt_reviewer "$selected" + ;; + *) + echo "unsupported reviewer: $selected" >&2 + return 2 + ;; + esac +} + +run_auto_review() { + run_selected_review codex + local status=$? + if [[ "$status" == 0 ]]; then + return 0 + fi + if (( status > 128 && status < 192 )); then + return "$status" + fi + if review_output_has_findings; then + return "$status" + fi + if [[ "$fallback_reviewer" == none ]]; then + return "$status" + fi + printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2 + run_selected_review "$fallback_reviewer" +} + +elapsed_since() { + local started_at=$1 + local finished_at + finished_at=$(date +%s) + printf '%s\n' "$((finished_at - started_at))" +} + +format_elapsed() { + local seconds=$1 + if (( seconds < 60 )); then + printf '%ss\n' "$seconds" + else + printf '%sm%ss\n' "$((seconds / 60))" "$((seconds % 60))" + fi +} + +review_output_empty() { + [[ ! -s "$review_output" ]] || ! grep -q '[^[:space:]]' "$review_output" +} + +review_findings_text() { + if grep -Fxq 'codex' "$review_output"; then + awk ' + $0 == "codex" { + capture = 1 + output = $0 ORS + next + } + capture { + output = output $0 ORS + } + END { + printf "%s", output + } + ' "$review_output" + return + fi + cat "$review_output" +} + +review_output_has_findings() { + review_findings_text | grep -Eq '\[P[0-3]\]' +} + +report_clean_review_or_fail() { + local elapsed_text + elapsed_text=$(format_elapsed "${review_elapsed_seconds:-0}") + + if review_output_has_findings; then + printf 'autoreview complete after %s\n' "$elapsed_text" + printf 'autoreview findings: accepted/actionable findings reported\n' + return 1 + fi + if review_output_empty; then + printf 'autoreview complete after %s; no output\n' "$elapsed_text" + return 1 + fi + printf 'autoreview complete after %s\n' "$elapsed_text" + printf 'autoreview clean: no accepted/actionable findings reported\n' +} + +if [[ -z "$parallel_tests" ]]; then + review_started_at=$(date +%s) + set +e + if [[ "$reviewer" == auto ]]; then + run_auto_review + else + run_selected_review "$reviewer" + fi + review_status=$? + review_elapsed_seconds=$(elapsed_since "$review_started_at") + set -e + if [[ "$review_status" == 0 ]]; then + report_clean_review_or_fail + exit $? + fi + exit "$review_status" +fi + +review_status_file=$(mktemp) +review_elapsed_file=$(mktemp) +tests_status_file=$(mktemp) + +( + set +e + review_started_at=$(date +%s) + if [[ "$reviewer" == auto ]]; then + run_auto_review + else + run_selected_review "$reviewer" + fi + status=$? + elapsed=$(elapsed_since "$review_started_at") + printf '%s\n' "$status" > "$review_status_file" + printf '%s\n' "$elapsed" > "$review_elapsed_file" +) & +review_pid=$! + +( + set +e + bash -lc "$parallel_tests" + status=$? + printf '%s\n' "$status" > "$tests_status_file" +) & +tests_pid=$! + +wait "$review_pid" || true +wait "$tests_pid" || true + +review_status=$(cat "$review_status_file") +review_elapsed_seconds=$(cat "$review_elapsed_file") +tests_status=$(cat "$tests_status_file") +rm -f "$review_status_file" "$review_elapsed_file" "$tests_status_file" + +printf 'autoreview exit: %s\n' "$review_status" +printf 'tests exit: %s\n' "$tests_status" + +if [[ "$review_status" != 0 || "$tests_status" != 0 ]]; then + exit 1 +fi + +report_clean_review_or_fail diff --git a/AGENTS.md b/AGENTS.md index 9719174a33c..9b22be90f43 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -73,7 +73,7 @@ Skills own workflows; root owns hard policy and routing. - Full suites, broad changed gates, Docker/package/E2E/live/cross-OS proof, or anything that bogs down the Mac: Crabbox/Testbox. - One/few files local. If a local command fans out, stop and move broad proof to Crabbox/Testbox. - Before handoff/push: prove touched surface. Before landing to `main`: issue proof plus appropriate full/broad proof unless scope is clearly narrow. -- Pre-land/pre-commit code changes: use `$codex-review` until no accepted/actionable findings remain, unless equivalent manual review already done, trivial/docs-only, or user opts out. +- Pre-land/pre-commit code changes: use `$autoreview` until no accepted/actionable findings remain, unless equivalent manual review already done, trivial/docs-only, or user opts out. - If proof is blocked, say exactly what is missing and why. - Do not land related failing format/lint/type/build/tests. If unrelated on latest `origin/main`, say so with scoped proof. - Docs/changelog-only and CI/workflow metadata-only: `git diff --check` plus relevant docs/workflow sanity; escalate only if scripts/config/generated/package/runtime behavior changed. diff --git a/CHANGELOG.md b/CHANGELOG.md index 9858cf0beee..94e6fc1bb7d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,7 @@ Docs: https://docs.openclaw.ai ### Changes - Mac app: redesign Settings pages with consistent card layouts, cached navigation, cleaner permissions/voice/skills/cron/exec/debug panes, and steadier spacing around the native sidebar. +- Skills: rename the repo-local Codex closeout review skill and helper to `autoreview` while preserving the Codex-first fallback behavior. - Skills: add a meme-maker skill for curated template search, local SVG/PNG rendering, Imgflip hosted rendering, and Know Your Meme provenance links. - Agents/tools: shorten built-in tool descriptions and schema hints across media, messaging, sessions, cron, Gateway, web, image/PDF, TTS, nodes, and plan tools while preserving routing guardrails. - Skills: add node inspector debugging, fused diagram generation, and throwaway spike workflow skills.