Skip to content

Conversation

@laraabastoss
Copy link

Added coded and respective documentation for the Space Saving, HyperLogLog and Hierarchical Heavy Hitters algorithms within the sketch section.

@smastelini
Copy link
Member

Hi @laraabastoss, thanks for your contribution! Recently some errors were fixed in the automated tests, so I am re-running them for this PR. Let's see how that goes and if you need to change something in your code. Perhaps you will need to pull the latest changes from the main branch.

Aside from that, I wanted to discuss a scope question. River already has a Heavy Hitters algorithm that is bound to provide the same functionality as Space Saving. I noticed that the current implementation in River supports a fading factor. I do not know the pros and cons of Space Saving vs Lossy Count with Forgetting Factor (the core of River's version), but I think we could do some renaming to keep both versions.

The idea is to follow the convention we followed so far for the stuff in river.sketch:

  • We use names that reflect functionality, rather than the actual algorithm name. For example, Counter, Set, and so on. The algorithm name and related info go in the documentation. So, in your case:
    • Space Saving -> Heavy Hitters (in this case, we will need to find a new name for the current implementation in sketch.HeavyHitters, like FadingHeavyHitters or something else -- suggestions are welcome)
    • Hierarchical Heavy Hitters is already conforming to the implicit convention
    • HyperLogLog -> Cardinality? Suggestions are welcome!
  • We try as much as possible to inspire in Python's collections module for API usage. This brings familiarity to the users and brings name choices tested by time :D. You can check the existing methods in the sketch module for inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants