openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-18 09:24:45 +00:00

Files

AaronLuo00 971ecabe80 fix(memory): account for CJK characters in QMD memory chunking

The QMD memory system uses a fixed 4:1 chars-to-tokens ratio for chunk
sizing, which severely underestimates CJK (Chinese/Japanese/Korean) text
where each character is roughly 1 token. This causes oversized chunks for
CJK users, degrading vector search quality and wasting context window space.

Changes:
- Add shared src/utils/cjk-chars.ts module with CJK-aware character
  counting (estimateStringChars) and token estimation helpers
- Update chunkMarkdown() in src/memory/internal.ts to use weighted
  character lengths for chunk boundary decisions and overlap calculation
- Replace hardcoded estimateTokensFromChars in the context report
  command with the shared utility
- Add 13 unit tests for the CJK estimation module and 5 new tests for
  CJK-aware memory chunking behavior

Backward compatible: pure ASCII/Latin text behavior is unchanged.

Closes #39965
Related: #40216

2026-03-29 10:22:43 +09:00

clawdbot

chore: bump version to 2026.2.12

2026-02-12 18:20:46 +01:00

memory-host-sdk

fix(memory): account for CJK characters in QMD memory chunking

2026-03-29 10:22:43 +09:00

moltbot

chore: bump version to 2026.2.12

2026-02-12 18:20:46 +01:00