mirror of https://github.com/openclaw/openclaw.git synced 2026-05-06 07:30:43 +00:00

Files

Jesse Merhi 6de9d71bfb feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483 )

* feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows [AI-assisted]

Stand up an end-to-end pipeline that turns every published openclaw GitHub
Security Advisory into a reusable OpenGrep rule, and wire the compiled rules
into manual-dispatch GitHub Actions workflows that publish SARIF to GitHub
Code Scanning.

The pipeline is harness-agnostic: any coding-agent CLI (Rovo Dev, Claude
Code, Codex, OpenCode, or anything you can shell out to) can drive it via
the runner script's --harness flag. Built-in adapters cover the four common
harnesses; --harness-cmd '<template>' supports anything else with shell-style
{prompt}/{model}/{output_file} substitution.

Pipeline pieces:

- scripts/run-ghsa-detector-review-batch.mjs runs your chosen coding harness
  in parallel against every advisory using the agent-agnostic detector-review
  spec at security/detector-review/detector-review-spec.md. Each case
  produces an opengrep general-rule.yml (precise) and broad-rule.yml
  (review-aid), plus a coverage-validated report against the vulnerable
  commit's changed files.
- scripts/compile-opengrep-rules.mjs walks a run directory, rewrites each
  rule's id to ghsa-detector.<ghsa>.<orig-id>, injects ghsa/advisory-url/
  detector-bucket/source-rule-id metadata, and uses opengrep itself to drop
  rules with InvalidRuleSchemaError so the published super-configs load
  cleanly.

Compiled outputs:

- security/opengrep/precise.yml     (336 rules)
- security/opengrep/broad.yml       (459 rules)
- security/opengrep/compile-manifest.json    (per-rule provenance map)

CI workflows (manual workflow_dispatch only):

- .github/workflows/opengrep-precise.yml
- .github/workflows/opengrep-broad.yml

Both install a pinned opengrep, run opengrep scan against src/, upload SARIF
to Code Scanning under categories opengrep-precise / opengrep-broad, and use
continue-on-error: true so findings never block the workflow.

Detector-review spec and assets:

- security/detector-review/detector-review-spec.md   the agent-agnostic spec
  the runner injects into each per-case prompt
- security/detector-review/references/{detector-rubric,report-template}.md
- security/detector-review/scripts/init_case.py
- security/prompt-suffix-coverage-first.md   mandatory prompt addendum that
  enforces coverage-first validation (rule must catch the OG vuln, not just
  pass synthetic fixtures)

Docs:

- security/README.md          end-to-end flow, supported harnesses, regen recipe
- security/opengrep/README.md compiled-config details + recompile recipe

* security: tighten GHSA OpenGrep detector workflow

* chore: refine precise opengrep workflow

* chore: remove stale opengrep metadata

* fix: harden GHSA OpenGrep workflow

* ci: split OpenGrep diff and full scans

* chore: remove performance-only opengrep rule

* ci: use OpenGrep installer path

* chore: enforce opengrep rule metadata provenance

* chore: generalize opengrep rule compilation

* docs: align opengrep rulepack guidance

* chore: support generic opengrep rule sources

* fix: validate opengrep rulepack-only changes

---------

Co-authored-by: Jesse Merhi <security-engineering@atlassian.com>

2026-04-30 02:42:20 +10:00

4.2 KiB

Raw Blame History

Compiled OpenGrep super-configs

precise.yml is OpenClaw's shipped precise OpenGrep rulepack. Each rule is tied to a source advisory, vulnerability report, or review identifier through metadata and is intended to have concrete coverage of the original vulnerable behavior or a verified variant.

Rule provenance lives in each compiled rule's metadata; no separate manifest is committed or generated by default.

Noisy exploratory rules are intentionally kept out of the tracked repo. Anything appended to precise.yml must be low-noise enough to run as a blocking PR-diff check and as a manual full-repository audit.

Editing rules

precise.yml is the checked-in compiled rulepack. Prefer changing source rule YAML and rerunning security/opengrep/compile-rules.mjs instead of hand-editing compiled rules. The compiler appends new rule IDs by default; use --replace-precise only when intentionally rebuilding the rulepack from a complete source folder. Direct edits are discouraged because they can bypass ID, metadata, duplicate, and OpenGrep validation.

Rule naming and metadata

Every rule's id is rewritten to <source-id>.<original-id>. Every rule's metadata block is augmented with source fields enforced by pnpm check:opengrep-rule-metadata:

Key	Value
`ghsa`	`GHSA-xxxx-xxxx-xxxx` for GHSA-backed rules
`advisory-id`	non-GHSA source identifier, or the GHSA ID normalized by the compiler
`advisory-url`	durable URL to the advisory, report, review record, or source context
`detector-bucket`	`precise`
`source-rule-id`	the original source rule id
`source-file`	optional source YAML file used during compilation

Recompiling

# from the openclaw repo root
node security/opengrep/compile-rules.mjs \
  --rules-dir <folder-with-source-rule-yaml>

The script:

Recursively walks every .yml / .yaml file under --rules-dir
Reads top-level rules arrays from those source files
Requires each source rule to provide metadata.ghsa or metadata.advisory-id
Requires metadata.advisory-url for non-GHSA source identifiers
Rewrites ids and injects metadata as above
Appends only new precise rule ids to the existing precise.yml by default; pass --replace-precise to rebuild it from just the supplied source folder
Runs opengrep scan --no-strict against an empty target to identify schema-invalid or parser-invalid rules and drops mapped bad rules so the published super-config loads cleanly
Writes precise.yml

Skipped, duplicate, or invalid rules are summarized on stdout/stderr for follow-up.

Validating locally

pnpm check:opengrep-rule-metadata
opengrep validate security/opengrep/precise.yml

The metadata check must pass before rules are committed. OpenGrep validation must exit zero. Warnings about unknown fields are acceptable only when OpenGrep still reports Configuration is valid and a non-zero rule count. The compile script drops mapped schema/parser-invalid rules and fails closed when OpenGrep validation itself cannot be completed.

Running locally

scripts/run-opengrep.sh

For SARIF output matching the PR workflow's diff-scoped scan:

scripts/run-opengrep.sh --changed --sarif

For SARIF output matching the manual full-repository workflow:

scripts/run-opengrep.sh --sarif

Why `--no-strict`?

Some generated rules trigger non-fatal opengrep warnings (for example, unknown-field warnings on compatibility-only keys). --no-strict keeps opengrep's exit code clean for those warnings. Parser-invalid rules are still dropped during compilation so the checked-in super-config validates before CI uses it.

Why `--no-git-ignore`?

Some OpenClaw paths are excluded by .gitignore for build reasons even though they contain meaningful source code we want scanned. --no-git-ignore keeps opengrep from skipping them.

4.2 KiB Raw Blame History