|
| 1 | +HyperLogLog (HLL) |
| 2 | +----------------- |
| 3 | +This is a high performance implementation of Phillipe Flajolet's HLL sketch but with significantly improved error behavior. |
| 4 | + |
| 5 | +If the ONLY use case for sketching is counting uniques and merging, the HLL sketch is a reasonable choice, although the highest performing in terms of accuracy for storage space consumed is CPC (Compressed Probabilistic Counting). For large enough counts, this HLL version (with HLL_4) can be 2 to 16 times smaller than the Theta sketch family for the same accuracy. |
| 6 | + |
| 7 | +This implementation offers three different types of HLL sketch, each with different trade-offs with accuracy, space and performance. |
| 8 | +These types are specified with the target_hll_type parameter. |
| 9 | + |
| 10 | +In terms of accuracy, all three types, for the same lg_config_k, have the same error distribution as a function of ``n``, the number of unique values fed to the sketch. |
| 11 | +The configuration parameter ``lg_config_k`` is the log-base-2 of ``k``, where ``k`` is the number of buckets or slots for the sketch. |
| 12 | + |
| 13 | +During warmup, when the sketch has only received a small number of unique items (up to about 10% of ``k``), this implementation leverages a new class of estimator algorithms with significantly better accuracy. |
| 14 | + |
| 15 | + |
| 16 | +.. autoclass:: _datasketches.tgt_hll_type |
| 17 | + |
| 18 | + .. autoattribute:: HLL_4 |
| 19 | + :annotation: : 4 bits per entry |
| 20 | + |
| 21 | + .. autoattribute:: HLL_6 |
| 22 | + :annotation: : 6 bits per entry |
| 23 | + |
| 24 | + .. autoattribute:: HLL_8 |
| 25 | + :annotation: : 8 bits per entry |
| 26 | + |
| 27 | + |
| 28 | +.. autoclass:: _datasketches.hll_sketch |
| 29 | + :members: |
| 30 | + :undoc-members: |
| 31 | + :exclude-members: deserialize, get_max_updatable_serialization_bytes, get_rel_err |
| 32 | + |
| 33 | + .. rubric:: Static Methods: |
| 34 | + |
| 35 | + .. automethod:: deserialize |
| 36 | + .. automethod:: get_max_updatable_serialization_bytes |
| 37 | + .. automethod:: get_rel_err |
| 38 | + |
| 39 | + .. rubric:: Non-static Methods: |
| 40 | + |
| 41 | + .. automethod:: __init__ |
| 42 | + |
| 43 | +.. autoclass:: _datasketches.hll_union |
| 44 | + :members: |
| 45 | + :undoc-members: |
| 46 | + :exclude-members: get_rel_err |
| 47 | + |
| 48 | + .. rubric:: Static Methods: |
| 49 | + |
| 50 | + .. automethod:: get_rel_err |
| 51 | + |
| 52 | + .. rubric:: Non-static Methods: |
| 53 | + |
| 54 | + .. automethod:: __init__ |
| 55 | + |
0 commit comments