mirror of
https://github.com/openclaw/openclaw.git
synced 2026-06-24 22:29:33 +00:00
PROTECTED_GLOSSARY exists to preserve short technical terms that generic
filtering would discard, but every glossary match still flowed through
normalizeConceptToken's per-script minimum-length gate. The 2-char latin
entries "kv" and "s3" were therefore never emitted as concept tags despite
being on the protect-list. Thread a fromGlossary flag so glossary matches
bypass only that length check; all other gates still apply.
Because that bypass lets short entries through, a bare substring match would
also surface them from inside longer words ("kv" in "mkv", "s3" in "css3").
Match ONLY the short entries (those below their script's min length) as
delimiter-bounded whole tokens; longer entries keep substring containment, so
the shipped behavior of "backup" tagging inside "backups" is preserved. CJK
entries (no word delimiters) always use substring matching. Positive
(standalone kv/s3) and negative (mkv/css3 substrings) regression tests cover
both directions, and the short-term-promotion stable-tags assertion gains "s3".
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>