docs(gateway/security): list system-reminder and previous_response in outbound stripping

For c2d31a5e59: docs/gateway/security/index.md "External content
special-token sanitization" section already mentions the outbound
sanitizer with `<tool_call>` and `<function_calls>` examples, but it
predates the new internal-runtime-scaffolding stripping that targets
`<system-reminder>` and `<previous_response>` tags. Adds those two tags
as explicit examples and notes the final channel delivery boundary so
operators reading the security page see the same coverage exposed by
the c2d31a5e59 sanitizer.
This commit is contained in:
Vincent Koc
2026-04-28 12:38:55 -07:00
parent c500e8704f
commit 98f5fd12df

View File

@@ -608,7 +608,7 @@ Why:
- OpenAI-compatible backends that front self-hosted models sometimes preserve special tokens that appear in user text, instead of masking them. An attacker who can write into inbound external content (a fetched page, an email body, a file contents tool output) could otherwise inject a synthetic `assistant` or `system` role boundary and escape the wrapped-content guardrails.
- Sanitization happens at the external-content wrapping layer, so it applies uniformly across fetch/read tools and inbound channel content rather than being per-provider.
- Outbound model responses already have a separate sanitizer that strips leaked `<tool_call>`, `<function_calls>`, and similar scaffolding from user-visible replies. The external-content sanitizer is the inbound counterpart.
- Outbound model responses already have a separate sanitizer that strips leaked `<tool_call>`, `<function_calls>`, `<system-reminder>`, `<previous_response>`, and similar internal runtime scaffolding from user-visible replies at the final channel delivery boundary. The external-content sanitizer is the inbound counterpart.
This does not replace the other hardening on this page — `dmPolicy`, allowlists, exec approvals, sandboxing, and `contextVisibility` still do the primary work. It closes one specific tokenizer-layer bypass against self-hosted stacks that forward user text with special tokens intact.