Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Bitmap-based frequency aggregations #37

Open
wants to merge 12 commits into
base: bw_branch_7_7_2
Choose a base branch
from

Conversation

mkavanagh
Copy link

@mkavanagh mkavanagh commented Jul 28, 2020

Following from Tim's work on collecting bitmaps of ordinal values, this is a prototype of two new aggregates also based on bitmaps of ordinals:

  • bitmapfreq: calculates the frequency of ordinal values (returned as a list of Roaring Bitmaps)
  • bitmapfreqfreq: calculates the frequency-of-frequency (number of values that appeared x times) of ordinal values (returned as a list of ints)

both support an optional "maxFrequency" param which will cap the frequencies returned, and accumulate values which hit the cap into an overflow bucket (to be returned also)

needs testing for functionality and performance - the hope is that these will be a good alternative to the memory-hungry terms facet used by the mention-service to estimate cardinality for authors in sampled queries

…aringBitmap object.

The response is a byte array from which you can construct a bitmap, using

new ImmutableRoaringBitmap(ByteBuffer.wrap(bitmapBytes))
@mkavanagh mkavanagh force-pushed the bitmapfrequency branch 3 times, most recently from f0dcd32 to d9f9739 Compare July 29, 2020 17:18
@mkavanagh mkavanagh force-pushed the bitmapfrequency branch 2 times, most recently from 5a1b172 to 2e49c19 Compare September 14, 2020 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants