mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-31 03:41:51 +00:00
When re-splitting CJK-heavy segments at chunking.tokens, check whether the slice boundary falls on a high surrogate (0xD800–0xDBFF) and if so extend by one code unit to keep the pair intact. Prevents producing broken surrogate halves for CJK Extension B+ characters (U+20000+). Add test verifying no lone surrogates appear when splitting lines of surrogate-pair characters with an odd token budget. Addresses third-round Codex P2 review comment.