docs: strengthen prompt injection warning for weaker models

This commit is contained in:
Peter Steinberger
2026-03-03 00:06:36 +00:00
parent 4bfbf2dfff
commit 11c397ef46

View File

@@ -516,7 +516,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom
- Run sensitive tool execution in a sandbox; keep secrets out of the agents reachable filesystem.
- Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals.
- Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists.
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer the strongest latest-generation, instruction-hardened model available for any bot with tools.
- **Model choice matters:** older/smaller/legacy models are significantly less robust against prompt injection and tool misuse. For tool-enabled agents, use the strongest latest-generation, instruction-hardened model available.
Red flags to treat as untrusted:
@@ -567,10 +567,14 @@ tool calls. Reduce the blast radius by:
Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.
<Warning>
For tool-enabled agents or agents that read untrusted content, prompt-injection risk with older/smaller models is often too high. Do not run those workloads on weak model tiers.
</Warning>
Recommendations:
- **Use the latest generation, best-tier model** for any bot that can run tools or touch files/networks.
- **Avoid older/weaker tiers** for tool-enabled agents or untrusted inboxes.
- **Do not use older/weaker/smaller tiers** for tool-enabled agents or untrusted inboxes; the prompt-injection risk is too high.
- If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
- When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled.
- For chat-only personal assistants with trusted input and no tools, smaller models are usually fine.