From 98f5fd12dfccab9510f2e768824f50ad0613c654 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 28 Apr 2026 12:38:55 -0700 Subject: [PATCH] docs(gateway/security): list system-reminder and previous_response in outbound stripping For c2d31a5e59: docs/gateway/security/index.md "External content special-token sanitization" section already mentions the outbound sanitizer with `` and `` examples, but it predates the new internal-runtime-scaffolding stripping that targets `` and `` tags. Adds those two tags as explicit examples and notes the final channel delivery boundary so operators reading the security page see the same coverage exposed by the c2d31a5e59 sanitizer. --- docs/gateway/security/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/gateway/security/index.md b/docs/gateway/security/index.md index 8d50041a2fc..add1748f252 100644 --- a/docs/gateway/security/index.md +++ b/docs/gateway/security/index.md @@ -608,7 +608,7 @@ Why: - OpenAI-compatible backends that front self-hosted models sometimes preserve special tokens that appear in user text, instead of masking them. An attacker who can write into inbound external content (a fetched page, an email body, a file contents tool output) could otherwise inject a synthetic `assistant` or `system` role boundary and escape the wrapped-content guardrails. - Sanitization happens at the external-content wrapping layer, so it applies uniformly across fetch/read tools and inbound channel content rather than being per-provider. -- Outbound model responses already have a separate sanitizer that strips leaked ``, ``, and similar scaffolding from user-visible replies. The external-content sanitizer is the inbound counterpart. +- Outbound model responses already have a separate sanitizer that strips leaked ``, ``, ``, ``, and similar internal runtime scaffolding from user-visible replies at the final channel delivery boundary. The external-content sanitizer is the inbound counterpart. This does not replace the other hardening on this page — `dmPolicy`, allowlists, exec approvals, sandboxing, and `contextVisibility` still do the primary work. It closes one specific tokenizer-layer bypass against self-hosted stacks that forward user text with special tokens intact.