Skip to content

Add proper sampling for estimating counts #3

@ramou

Description

@ramou

We know how to do it. We've diced up the problem, done a bunch of the proofs and we've even written out how we'll do it. Let's do it and squeeze out that extra boost in performance when someone's sorting data that isn't uniformly distributed. It ain't that hard and we're already paying the cost for start/end counts because we knew we'd do this eventually.

That said, when I do this is it worth having an explicit uniform distribution version that doesn't do the start/end to squeeze out that tiny improvement from replacing a memory lookup with some basic arithmetic? I'll decide when I actually do this ticket.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions