docs: strengthen prompt injection warning for weaker models

2026-05-06 16:50:43 +00:00 · 2026-03-03 00:06:36 +00:00
parent 4bfbf2dfff
commit 11c397ef46
1 changed files with 6 additions and 2 deletions
--- a/docs/gateway/security/index.md
+++ b/docs/gateway/security/index.md
@@ -516,7 +516,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom
 - Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
 - Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals.
 - Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists.
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer the strongest latest-generation, instruction-hardened model available for any bot with tools.
+- **Model choice matters:** older/smaller/legacy models are significantly less robust against prompt injection and tool misuse. For tool-enabled agents, use the strongest latest-generation, instruction-hardened model available.

 Red flags to treat as untrusted:

@@ -567,10 +567,14 @@ tool calls. Reduce the blast radius by:

 Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.

+<Warning>
+For tool-enabled agents or agents that read untrusted content, prompt-injection risk with older/smaller models is often too high. Do not run those workloads on weak model tiers.
+</Warning>
+
 Recommendations:

 - **Use the latest generation, best-tier model** for any bot that can run tools or touch files/networks.
- **Avoid older/weaker tiers** for tool-enabled agents or untrusted inboxes.
+- **Do not use older/weaker/smaller tiers** for tool-enabled agents or untrusted inboxes; the prompt-injection risk is too high.
 - If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
 - When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled.
 - For chat-only personal assistants with trusted input and no tools, smaller models are usually fine.