diff --git a/docs/gateway/security/index.md b/docs/gateway/security/index.md index e2e2523065d..e4b0b209fa1 100644 --- a/docs/gateway/security/index.md +++ b/docs/gateway/security/index.md @@ -516,7 +516,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom - Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem. - Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals. - Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists. -- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer the strongest latest-generation, instruction-hardened model available for any bot with tools. +- **Model choice matters:** older/smaller/legacy models are significantly less robust against prompt injection and tool misuse. For tool-enabled agents, use the strongest latest-generation, instruction-hardened model available. Red flags to treat as untrusted: @@ -567,10 +567,14 @@ tool calls. Reduce the blast radius by: Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts. + +For tool-enabled agents or agents that read untrusted content, prompt-injection risk with older/smaller models is often too high. Do not run those workloads on weak model tiers. + + Recommendations: - **Use the latest generation, best-tier model** for any bot that can run tools or touch files/networks. -- **Avoid older/weaker tiers** for tool-enabled agents or untrusted inboxes. +- **Do not use older/weaker/smaller tiers** for tool-enabled agents or untrusted inboxes; the prompt-injection risk is too high. - If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists). - When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled. - For chat-only personal assistants with trusted input and no tools, smaller models are usually fine.