openclaw/security/opengrep/README.md

# Compiled OpenGrep super-configs

`precise.yml` is OpenClaw's shipped precise OpenGrep rulepack. Each rule is tied
to a source advisory, vulnerability report, or review identifier through metadata
and is intended to have concrete coverage of the original vulnerable behavior or
a verified variant.

Rule provenance lives in each compiled rule's metadata; no separate manifest is
committed or generated by default.

Noisy exploratory rules are intentionally kept out of the tracked repo. Anything
appended to `precise.yml` must be low-noise enough to run as a blocking PR-diff
check and as a manual full-repository audit.

## Editing rules

`precise.yml` is the checked-in compiled rulepack. Prefer changing source rule
YAML and rerunning `security/opengrep/compile-rules.mjs` instead of hand-editing
compiled rules. The compiler appends new rule IDs by default; use
`--replace-precise` only when intentionally rebuilding the rulepack from a
complete source folder. Direct edits are discouraged because they can bypass ID,
metadata, duplicate, and OpenGrep validation.

## Rule naming and metadata

Every rule's id is rewritten to `<source-id>.<original-id>`. Every rule's
`metadata` block is augmented with source fields enforced by
`pnpm check:opengrep-rule-metadata`:

| Key               | Value                                                                 |
| ----------------- | --------------------------------------------------------------------- |
| `ghsa`            | `GHSA-xxxx-xxxx-xxxx` for GHSA-backed rules                           |
| `advisory-id`     | non-GHSA source identifier, or the GHSA ID normalized by the compiler |
| `advisory-url`    | durable URL to the advisory, report, review record, or source context |
| `detector-bucket` | `precise`                                                             |
| `source-rule-id`  | the original source rule id                                           |
| `source-file`     | optional source YAML file used during compilation                     |

## Recompiling

```bash
# from the openclaw repo root
node security/opengrep/compile-rules.mjs \
  --rules-dir <folder-with-source-rule-yaml>
```

The script:

1. Recursively walks every `.yml` / `.yaml` file under `--rules-dir`
2. Reads top-level `rules` arrays from those source files
3. Requires each source rule to provide `metadata.ghsa` or `metadata.advisory-id`
4. Requires `metadata.advisory-url` for non-GHSA source identifiers
5. Rewrites ids and injects metadata as above
6. Appends only new precise rule ids to the existing `precise.yml` by default; pass `--replace-precise` to rebuild it from just the supplied source folder
7. Runs `opengrep scan --no-strict` against an empty target to identify schema-invalid or parser-invalid rules and drops mapped bad rules so the published super-config loads cleanly
8. Writes `precise.yml`

Skipped, duplicate, or invalid rules are summarized on stdout/stderr for follow-up.

## Validating locally

```bash
pnpm check:opengrep-rule-metadata
opengrep validate security/opengrep/precise.yml
```

The metadata check must pass before rules are committed. OpenGrep validation must
exit zero. Warnings about unknown fields are acceptable only when OpenGrep still
reports `Configuration is valid` and a non-zero rule count. The compile script
drops mapped schema/parser-invalid rules and fails closed when OpenGrep
validation itself cannot be completed.

## Running locally

```bash
scripts/run-opengrep.sh
```

For SARIF output matching the PR workflow's diff-scoped scan:

```bash
scripts/run-opengrep.sh --changed --sarif
```

For SARIF output matching the manual full-repository workflow:

```bash
scripts/run-opengrep.sh --sarif
```

## Why `--no-strict`?

Some generated rules trigger non-fatal opengrep warnings (for example,
unknown-field warnings on compatibility-only keys). `--no-strict` keeps
opengrep's exit code clean for those warnings. Parser-invalid rules are still
dropped during compilation so the checked-in super-config validates before CI
uses it.

## Why `--no-git-ignore`?

Some OpenClaw paths are excluded by `.gitignore` for build reasons even though
they contain meaningful source code we want scanned. `--no-git-ignore` keeps
opengrep from skipping them.