Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize null lookup for StaticStringMap #22967

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cpizano
Copy link

@cpizano cpizano commented Feb 21, 2025

Currently the initLenIndexes function populates the len_indexes array with values (which are indexes into the kvs array) of the form:
{0, 0, 0, 0, .. 1, 1, 1, .. 2, 2, 2 ....}

where the repeated indexes are the "missing sizes", for example if the keys are {"ab", "abcdefghi"} the len_indexes array looks like:
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}

When you lookup say "none", it will compare first with kvs.keys[0] which is "ab" then in the second loop iteration with kvs.keys[1] which does return null.

The proposed change replaces these repeated indexes for the u32 max-value sentinel which indicates that for that length there are no entries in the kv array. In the case above then it does an early without having to fetch kvs.keys[i].

The speedup is expected to be minimal, unless there are lots of null lookups or the kvs array is in a contested cache line.

An alternative is to use 0 as sentinel, which means the empty string could not be used as a key, which is a breaking ABI change.

A test is added to make explicit that the "" key behavior is not accidentally broken.

Currently the initLenIndexes function populates the len_indexes
array with values (which are indexes into the kvs array) of the
form:
  {0, 0, 0, 0, .. 1, 1, 1, .. 2, 2, 2 ....}

where the repeated indexes are the "missing sizes", for example
if the keys are {"ab", "abcdefghi"} the len_indes array looks
like:
  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}

When you lookup say "none", will compare first with kvs.keys[0]
which is "ab" then in the second loop iteration with kvs.keys[1]
which does return null.

The proposed change replaces these repeated indexes for the
u32 max-value sentinel which indicates that for that length
there are no entries in the kv array. In the case above then
it does an early without having to fetch kvs.keys[i].

The speedup is expected to be minimal, unless there are lots
of null lookups or the kvs array is in a contested cache line.

An alternative is to use 0 as sentinel, which means the empty
string could not be used as a key, which is a breaking ABI
change.

A test is added to make explicit that the "" key behavior
is not accidentally broken.
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. If you want this merged I have 2 requirements:

  • provide perf data points
  • use integer type safety. if max int is special then use an enum, not u32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants