openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-07-04 11:53:30 +00:00

Files

buyitsydney 4b69c6d3f1 fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection

The tokenize() function only matched [a-z0-9_]+ patterns, returning an
empty set for CJK-only text. This made Jaccard similarity always 0 (or
always 1 for two empty sets) for CJK content, effectively disabling MMR
diversity detection.

Add support for:
- CJK Unified Ideographs (U+4E00–U+9FFF, U+3400–U+4DBF)
- Hiragana (U+3040–U+309F) and Katakana (U+30A0–U+30FF)
- Hangul Syllables (U+AC00–U+D7AF) and Jamo (U+1100–U+11FF)

Characters are extracted as unigrams, and bigrams are generated only
from characters that are adjacent in the original text (no spurious
bigrams across ASCII boundaries).

Fixes #28000

2026-03-28 09:19:52 +05:30

src

fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection

2026-03-28 09:19:52 +05:30

api.ts

refactor: move bundled plugin policy into manifests

2026-03-27 16:40:27 +00:00

index.test.ts

refactor: move memory engine behind plugin adapters

2026-03-27 00:47:01 +00:00

index.ts

refactor: move memory engine behind plugin adapters