mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-31 03:41:51 +00:00
- Two-pass line splitting: first slice at maxChars (unchanged for Latin), then re-split only CJK-heavy segments at chunking.tokens. This preserves the original ~800-char segments for ASCII lines while keeping CJK chunks within the token budget. - Narrow surrogate-pair adjustment to CJK Extension B+ range (D840–D87E) only, so emoji surrogate pairs are not affected. Mixed CJK+emoji text is now handled consistently regardless of composition. - Add tests: emoji handling (2), Latin backward-compat long-line (1). Addresses Codex P1 (oversized CJK segments) and P2s (Latin over-splitting, emoji surrogate inconsistency).