docs(gateway/security): list system-reminder and previous_response in outbound stripping

For c2d31a5e59: docs/gateway/security/index.md "External content special-token sanitization" section already mentions the outbound sanitizer with `<tool_call>` and `<function_calls>` examples, but it predates the new internal-runtime-scaffolding stripping that targets `<system-reminder>` and `<previous_response>` tags. Adds those two tags as explicit examples and notes the final channel delivery boundary so operators reading the security page see the same coverage exposed by the c2d31a5e59 sanitizer.
2026-05-06 15:50:46 +00:00 · 2026-04-28 12:38:55 -07:00
parent c500e8704f
commit 98f5fd12df
1 changed files with 1 additions and 1 deletions
--- a/docs/gateway/security/index.md
+++ b/docs/gateway/security/index.md
@@ -608,7 +608,7 @@ Why:

 - OpenAI-compatible backends that front self-hosted models sometimes preserve special tokens that appear in user text, instead of masking them. An attacker who can write into inbound external content (a fetched page, an email body, a file contents tool output) could otherwise inject a synthetic `assistant` or `system` role boundary and escape the wrapped-content guardrails.
 - Sanitization happens at the external-content wrapping layer, so it applies uniformly across fetch/read tools and inbound channel content rather than being per-provider.
- Outbound model responses already have a separate sanitizer that strips leaked `<tool_call>`, `<function_calls>`, and similar scaffolding from user-visible replies. The external-content sanitizer is the inbound counterpart.
+- Outbound model responses already have a separate sanitizer that strips leaked `<tool_call>`, `<function_calls>`, `<system-reminder>`, `<previous_response>`, and similar internal runtime scaffolding from user-visible replies at the final channel delivery boundary. The external-content sanitizer is the inbound counterpart.

 This does not replace the other hardening on this page — `dmPolicy`, allowlists, exec approvals, sandboxing, and `contextVisibility` still do the primary work. It closes one specific tokenizer-layer bypass against self-hosted stacks that forward user text with special tokens intact.