Skip to content

Incorrect step size when string is longer than the number of n-grams desired #18

@dblock

Description

@dblock

In https://github.com/artsy/mongoid_fulltext/blob/master/lib/mongoid_fulltext.rb:

# Figure out how many ngrams to extract from the string. If we can't afford to extract all ngrams,
# step over the string in evenly spaced strides to extract ngrams. For example, to extract 3 3-letter
# ngrams from 'abcdefghijk', we'd want to extract 'abc', 'efg', and 'ijk'.
if bound_number_returned
   step_size = [((filtered_str.length - config[:ngram_width]).to_f / config[:max_ngrams_to_search]).ceil, 1].max
else
   step_size = 1
end

If we want to get 3 n-grams: abc, efg and ijk from abcdefghijk (11) we need a step of 4, not 3.

(11.to_f - 3) / 3 = 2.6, ceil to 3

I think this needs to not do - config[:ngram_width].

However, I wonder whether the comment is incorrect and we want the first 3 n-grams instead of skipping characters.

cc: @aaw

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions