Files
openclaw/extensions/memory-core
ly-wang19 bea3d292c7 fix(memory-core): keep short protected-glossary terms past the min-length gate (#96304)
PROTECTED_GLOSSARY exists to preserve short technical terms that generic
filtering would discard, but every glossary match still flowed through
normalizeConceptToken's per-script minimum-length gate. The 2-char latin
entries "kv" and "s3" were therefore never emitted as concept tags despite
being on the protect-list. Thread a fromGlossary flag so glossary matches
bypass only that length check; all other gates still apply.

Because that bypass lets short entries through, a bare substring match would
also surface them from inside longer words ("kv" in "mkv", "s3" in "css3").
Match ONLY the short entries (those below their script's min length) as
delimiter-bounded whole tokens; longer entries keep substring containment, so
the shipped behavior of "backup" tagging inside "backups" is preserved. CJK
entries (no word delimiters) always use substring matching. Positive
(standalone kv/s3) and negative (mkv/css3 substrings) regression tests cover
both directions, and the short-term-promotion stable-tags assertion gains "s3".

Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 21:32:58 +08:00
..
2026-06-04 21:40:44 -04:00