mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-12 07:20:45 +00:00
docs: strengthen prompt injection warning for weaker models
This commit is contained in:
@@ -516,7 +516,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom
|
||||
- Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
|
||||
- Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals.
|
||||
- Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists.
|
||||
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer the strongest latest-generation, instruction-hardened model available for any bot with tools.
|
||||
- **Model choice matters:** older/smaller/legacy models are significantly less robust against prompt injection and tool misuse. For tool-enabled agents, use the strongest latest-generation, instruction-hardened model available.
|
||||
|
||||
Red flags to treat as untrusted:
|
||||
|
||||
@@ -567,10 +567,14 @@ tool calls. Reduce the blast radius by:
|
||||
|
||||
Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.
|
||||
|
||||
<Warning>
|
||||
For tool-enabled agents or agents that read untrusted content, prompt-injection risk with older/smaller models is often too high. Do not run those workloads on weak model tiers.
|
||||
</Warning>
|
||||
|
||||
Recommendations:
|
||||
|
||||
- **Use the latest generation, best-tier model** for any bot that can run tools or touch files/networks.
|
||||
- **Avoid older/weaker tiers** for tool-enabled agents or untrusted inboxes.
|
||||
- **Do not use older/weaker/smaller tiers** for tool-enabled agents or untrusted inboxes; the prompt-injection risk is too high.
|
||||
- If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
|
||||
- When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled.
|
||||
- For chat-only personal assistants with trusted input and no tools, smaller models are usually fine.
|
||||
|
||||
Reference in New Issue
Block a user