mirror of https://github.com/openclaw/openclaw.git synced 2026-05-06 05:50:43 +00:00

Files

Jesse Merhi 6de9d71bfb feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483 )

* feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows [AI-assisted]

Stand up an end-to-end pipeline that turns every published openclaw GitHub
Security Advisory into a reusable OpenGrep rule, and wire the compiled rules
into manual-dispatch GitHub Actions workflows that publish SARIF to GitHub
Code Scanning.

The pipeline is harness-agnostic: any coding-agent CLI (Rovo Dev, Claude
Code, Codex, OpenCode, or anything you can shell out to) can drive it via
the runner script's --harness flag. Built-in adapters cover the four common
harnesses; --harness-cmd '<template>' supports anything else with shell-style
{prompt}/{model}/{output_file} substitution.

Pipeline pieces:

- scripts/run-ghsa-detector-review-batch.mjs runs your chosen coding harness
  in parallel against every advisory using the agent-agnostic detector-review
  spec at security/detector-review/detector-review-spec.md. Each case
  produces an opengrep general-rule.yml (precise) and broad-rule.yml
  (review-aid), plus a coverage-validated report against the vulnerable
  commit's changed files.
- scripts/compile-opengrep-rules.mjs walks a run directory, rewrites each
  rule's id to ghsa-detector.<ghsa>.<orig-id>, injects ghsa/advisory-url/
  detector-bucket/source-rule-id metadata, and uses opengrep itself to drop
  rules with InvalidRuleSchemaError so the published super-configs load
  cleanly.

Compiled outputs:

- security/opengrep/precise.yml     (336 rules)
- security/opengrep/broad.yml       (459 rules)
- security/opengrep/compile-manifest.json    (per-rule provenance map)

CI workflows (manual workflow_dispatch only):

- .github/workflows/opengrep-precise.yml
- .github/workflows/opengrep-broad.yml

Both install a pinned opengrep, run opengrep scan against src/, upload SARIF
to Code Scanning under categories opengrep-precise / opengrep-broad, and use
continue-on-error: true so findings never block the workflow.

Detector-review spec and assets:

- security/detector-review/detector-review-spec.md   the agent-agnostic spec
  the runner injects into each per-case prompt
- security/detector-review/references/{detector-rubric,report-template}.md
- security/detector-review/scripts/init_case.py
- security/prompt-suffix-coverage-first.md   mandatory prompt addendum that
  enforces coverage-first validation (rule must catch the OG vuln, not just
  pass synthetic fixtures)

Docs:

- security/README.md          end-to-end flow, supported harnesses, regen recipe
- security/opengrep/README.md compiled-config details + recompile recipe

* security: tighten GHSA OpenGrep detector workflow

* chore: refine precise opengrep workflow

* chore: remove stale opengrep metadata

* fix: harden GHSA OpenGrep workflow

* ci: split OpenGrep diff and full scans

* chore: remove performance-only opengrep rule

* ci: use OpenGrep installer path

* chore: enforce opengrep rule metadata provenance

* chore: generalize opengrep rule compilation

* docs: align opengrep rulepack guidance

* chore: support generic opengrep rule sources

* fix: validate opengrep rulepack-only changes

---------

Co-authored-by: Jesse Merhi <security-engineering@atlassian.com>

2026-04-30 02:42:20 +10:00

opengrep

feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483 )

2026-04-30 02:42:20 +10:00

README.md

feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483 )

2026-04-30 02:42:20 +10:00

README.md

Security tooling

This directory holds OpenClaw's shipped OpenGrep security rulepack and the supporting tooling that validates and runs it. Maintainer-only advisory triage and detector-generation prompts live outside the public repo; this repo keeps the durable artifacts needed to block regressions in PRs and support local rule validation.

Layout

security/
├── README.md                              <- this file
└── opengrep/
    ├── README.md                          <- precise rulepack details + compile recipe
    └── precise.yml                        <- compiled super-config: precise rules

The related scripts are:

security/opengrep/compile-rules.mjs — gathers source OpenGrep rule YAMLs from a folder and appends new compiled rule IDs to security/opengrep/precise.yml.
security/opengrep/check-rule-metadata.mjs — enforces that every committed rule carries durable source/provenance metadata.
scripts/run-opengrep.sh — runs the compiled precise rulepack locally or in CI with consistent paths and exclusions.

Rule lifecycle

Maintainers investigate advisories and generate candidate rules outside the public repo. Once a candidate rule has been validated and reviewed, put the shippable source rule YAML in any local folder and compile it into this repo:

node security/opengrep/compile-rules.mjs \
  --rules-dir <folder-with-source-rule-yaml>

Commit the resulting security/opengrep/precise.yml diff. Durable rule provenance lives in each compiled rule's metadata and is checked by pnpm check:opengrep-rule-metadata.

Rule quality contract: precise rules must catch the vulnerable behavior they were written for, should be silent on corresponding fixed behavior when a fix exists, and should keep current findings limited to verified regressions or variants.

Writing precise OpenGrep rules

A rule is appropriate for security/opengrep/precise.yml only when the dangerous shape is stable enough to block PRs. Prefer, in order:

Variant detector — source-to-sink or missing-guard detection across the same bug family.
Scoped behavioral regression — a narrow subsystem-specific rule anchored on the affected API or trust boundary.
Exact regression canary — a labelled canary for the original vulnerable shape when broader variants would be noisy.
No OpenGrep rule — if runtime state, product policy, or external data is required to distinguish vulnerable and safe behavior.

Before compiling a rule, validate it against vulnerable/fixed/current code when those surfaces exist. Every current finding must be classified as a true original issue or true variant, or the rule must be tightened/dropped before it ships.

Running the rules locally

The wrapper script handles paths, exclusions, and output formatting so local scans match CI exactly.

scripts/run-opengrep.sh                 # precise rules, human output
scripts/run-opengrep.sh --json          # write .opengrep-out/precise.json
scripts/run-opengrep.sh --sarif         # write .opengrep-out/precise.sarif
scripts/run-opengrep.sh --changed       # scan changed first-party paths
scripts/run-opengrep.sh -- src/agents/  # scan a single dir

If you'd rather invoke opengrep directly, the equivalent is:

opengrep scan --no-strict --no-git-ignore \
  --config security/opengrep/precise.yml \
  src/ extensions/ apps/ packages/ scripts/

Both forms read .semgrepignore at the repo root automatically — that's the single source of truth for which paths are skipped (test files, fixtures, mocks, QA-tooling extensions, test-orchestration scripts, …). Add a glob there if a new test naming convention shows up.

Running the rules in CI

There are two OpenGrep workflows:

OpenGrep — PR Diff (.github/workflows/opengrep-precise.yml) runs on pull requests and executes scripts/run-opengrep.sh --changed --sarif --error so findings stay scoped to changed first-party paths.
OpenGrep — Full (.github/workflows/opengrep-precise-full.yml) is manual dispatch only and executes scripts/run-opengrep.sh --sarif --error across the full first-party source set for maintainers who want a repository-wide audit.

Both workflows:

Inherit the same .semgrepignore exclusions used by the local wrapper
Upload SARIF to GitHub Code Scanning under stable OpenGrep categories
Fail on precise findings so the rulepack acts as a regression firewall
Enforce committed rule provenance with pnpm check:opengrep-rule-metadata

Editing, silencing, or removing rules

precise.yml is the checked-in compiled rulepack. Prefer editing source rule YAML and recompiling instead of hand-editing compiled rules, because the compiler normalizes rule IDs, metadata, duplicates, and OpenGrep validation. The compiler appends new rule IDs by default; use --replace-precise only when intentionally rebuilding the rulepack from a complete source folder.

To drop a noisy rule:

Delete the offending source rule from the local source-rule folder.
Re-run node security/opengrep/compile-rules.mjs --rules-dir <folder-with-source-rule-yaml>.
Commit the resulting security/opengrep/precise.yml diff.

To narrow a rule's path scope, edit the source rule's paths.include / paths.exclude fields in the same local artifact location and recompile.

Tracing a finding back to its source

Every compiled rule's id is <source-id>.<original-id>. For GHSA-backed rules, <source-id> is the lower-case GHSA ID. For other source-backed rules, use a stable source identifier without dots such as a CVE, OSV ID, internal advisory ID, or other review identifier. Rule metadata must include advisory-url, detector-bucket, and source-rule-id, plus either ghsa or advisory-id. New compilations also add source-file when available. pnpm check:opengrep-rule-metadata enforces these durable source fields so each committed rule is traceable without a separate committed manifest.