Files
openclaw/packages/memory-host-sdk
AaronLuo00 f8547fcae4 fix: guard fine-split against breaking UTF-16 surrogate pairs
When re-splitting CJK-heavy segments at chunking.tokens, check whether the
slice boundary falls on a high surrogate (0xD800–0xDBFF) and if so extend
by one code unit to keep the pair intact.  Prevents producing broken
surrogate halves for CJK Extension B+ characters (U+20000+).

Add test verifying no lone surrogates appear when splitting lines of
surrogate-pair characters with an odd token budget.

Addresses third-round Codex P2 review comment.
2026-03-29 10:22:43 +09:00
..