Open
Description
Context / Scenario
When using Chinese markdown file, GetTokens function returns incorrectly.
CL100KTokenizer cL100KTokenizer=new CL100KTokenizer();
var result= cL100KTokenizer.GetTokens("交通运输部关于发布《公路桥涵设计通用规范》的公告\r\n现发布《公路桥涵设计通用规范》(JTG D60-2015),作为公路工程行业标准,自 2015 年 12 月 1 日起施行,原《公路桥涵设计通用规范》(JTG D60-2004)同时废止。");
What happened?
Get the right result, this will affect MarkDownChunker's data splitting.
Importance
a fix would make my life easier
Platform, Language, Versions
C#
Microsoft.KernelMemory.Core
V0.98.250324.1