TextDecoder("utf-16le/be") never throws on arbitrary byte pairs —
every pair of bytes is a valid Unicode scalar, so an attacker can
prepend a UTF-16 BOM (0xFF 0xFE) to binary garbage, give the file a
.csv/.md extension, and pass getTextStats with printableRatio≈1.0,
bypassing the host-read security boundary.
Remove resolveUtf16Charset and the UTF-16 branches from decodeHostReadText.
The Latin-1 fallback (gated by hasSingleByteTextShape) already covers
the most common non-UTF-8 real-world case: Excel CSV exports with
accented characters like é, ñ. UTF-16 CSVs are extremely rare and
users can trivially re-save as UTF-8.
Adds two regression tests:
- NUL-padded (0x00/0xFF) must be rejected
- BOM-prefixed (0xFF 0xFE + 0xFF garbage) must be rejected
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The NUL-heavy heuristic in resolveUtf16Charset was unsafe as a security
gate: TextDecoder("utf-16le") never throws, so every byte pair in an
opaque binary (e.g. repeating 0x00/0xFF) decodes to a printable code
point and passes the text-stats check, allowing the upload.
Remove the heuristic; only a leading BOM (0xFF 0xFE / 0xFE 0xFF) now
triggers UTF-16 decoding. Without a BOM the strict UTF-8 path runs
first, and NUL-padded binaries are then rejected by hasSingleByteTextShape
(0x00 bytes are control bytes).
Adds a regression test: 9000-byte alternating-NUL/0xFF buffer must be
rejected as path-not-allowed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raise the ASCII floor to 70% and add an explicit 30% high-byte cap.
The previous 50% threshold accepted alternating 0x41/0xFF buffers
(50% ASCII, 0 control bytes), which decoded through Latin-1 and passed
the printable-ratio gate — allowing opaque binary data to slip through
as a CSV or Markdown document.
Real single-byte text exports (e.g. Excel Latin-1 CSVs with accented
chars like é, ñ) rarely exceed 20-25% high bytes, so the tighter
thresholds do not regress legitimate input.
Adds a regression test: 9000 bytes alternating 'A'/0xFF must be
rejected as path-not-allowed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous null-byte check was too narrow — binary payloads with no 0x00
bytes (e.g. short/unsupported formats) could still pass. Replace it with
looksLikeText(), which rejects any byte in the C0 control range (0x00–0x08,
0x0E–0x1F, 0x7F), matching the same heuristic used by git and the file
command to distinguish text from binary. Bytes ≥ 0x80 are kept so UTF-8,
Latin-1, and Windows-1252 encoded files continue to pass.
The previous check only scanned the first 8 KiB, leaving a window where a
file with a null-free prefix followed by binary content could pass the guard.
Scan the entire buffer to close that gap.
Two issues flagged by Greptile:
1. CSV/Markdown exception was dead code.
file-type v22 returns undefined (not "text/plain") for plain-text buffers
that have no binary magic bytes. The guard `sniffedMime === "text/plain"`
was therefore always false, so the early-return never fired and CSV uploads
continued to be rejected.
Fix: check `!sniffedMime` (no binary signature) and add a null-byte scan
of the first 8 KiB to rule out binary data that happens to have no known
magic bytes. Pass buffer into assertHostReadMediaAllowed to enable this.
2. "rejects binary disguised as CSV" test used PNG bytes.
assertHostReadMediaAllowed allows all image kinds unconditionally
(sniffedKind === "image" → early return), so the promise resolved instead
of rejecting. The test would have failed with "Received promise resolved".
Fix: use ZIP magic bytes (PK\x03\x04). file-type detects application/zip,
which is not image/audio/video, so it falls through to the final throw.
Extend the text/plain sniff exception to cover text/markdown in addition
to text/csv. Both formats are structurally indistinguishable from plain
text at the byte level, so the same pattern applies.
CSV files (text/csv) were rejected by assertHostReadMediaAllowed because
content sniffers report text/plain for CSV — CSV is structurally
indistinguishable from plain text at the byte level.
Fix:
- Add text/csv to HOST_READ_ALLOWED_DOCUMENT_MIMES
- Add a targeted exception: when sniffed MIME is text/plain AND the
extension-derived MIME is text/csv, allow the upload. The text/plain
sniff already confirms the content is valid UTF-8 text (not binary),
so the .csv extension is sufficient to confirm operator intent.
Binary data disguised as .csv is still rejected because its sniffed MIME
will not be text/plain (e.g. a PNG file sniffs as image/png).
Fixes#63604
* fix(cron): preserve all fields in announce delivery by removing summarization instruction
The delivery instruction appended to the cron agent prompt contained the word
'summary', causing LLMs to condense structured output non-deterministically and
drop fields on delivery. Replace with 'response' and add explicit instruction
to reproduce all fields exactly.
Fixes#58535
* chore(changelog): add cron announce entry
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* feat(memory-lancedb): add cloud storage support to memory-lancedb
- Pass storageOptions to LanceDB connection
# Conflicts:
# extensions/memory-lancedb/index.ts
# Conflicts:
# extensions/memory-lancedb/config.ts
* support env var
* make storageOptions sensitive