|
fn add_token(&mut self, doc_ref: &str, token: &str, term_freq: f64) { |
|
let mut iter = token.chars(); |
|
if let Some(character) = iter.next() { |
During index building, elasticlunr-rs iterates over the token &str's content in Unicode Scalar Values.
While the JS library does it in this way:
elasticlunr.InvertedIndex.prototype.addToken = function (token, tokenInfo, root) {
var root = root || this.root,
idx = 0;
while (idx <= token.length - 1) {
var key = token[idx];
The JS string is actually iterated in UTF-16 Code Units, which are entire characters for English, most alphabetic text, common Chinese characters; but not Emojis and rare Chinese characters.
Related issue with mdBook.