apache
diff --git a/‎docs/5.2.0/.buildinfo‎
Lines changed: 4 additions & 0 deletions b/‎docs/5.2.0/.buildinfo‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/distinct_counting/cpc.rst.txt‎
Lines changed: 32 additions & 0 deletions b/‎docs/5.2.0/_sources/distinct_counting/cpc.rst.txt‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/distinct_counting/hyper_log_log.rst.txt‎
Lines changed: 55 additions & 0 deletions b/‎docs/5.2.0/_sources/distinct_counting/hyper_log_log.rst.txt‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/distinct_counting/index.rst.txt‎
Lines changed: 24 additions & 0 deletions b/‎docs/5.2.0/_sources/distinct_counting/index.rst.txt‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/distinct_counting/theta.rst.txt‎
Lines changed: 65 additions & 0 deletions b/‎docs/5.2.0/_sources/distinct_counting/theta.rst.txt‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/distinct_counting/tuple.rst.txt‎
Lines changed: 64 additions & 0 deletions b/‎docs/5.2.0/_sources/distinct_counting/tuple.rst.txt‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎docs/5.2.0/_sources/frequency/count_min_sketch.rst.txt‎
Lines changed: 27 additions & 0 deletions b/‎docs/5.2.0/_sources/frequency/count_min_sketch.rst.txt‎
Lines changed: 27 additions & 0 deletions
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 86d4b171ba47c51d1a3d7f924e02560f
+tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -0,0 +1,32 @@
+Compressed Probabilistic Counting (CPC)
+---------------------------------------
+High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch.
+This is a unique-counting sketch that implements the Compressed Probabilistic Counting (CPC, a.k.a FM85) algorithms developed by Kevin Lang in his paper
+`Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm <https://arxiv.org/abs/1708.06839>`_.
+This sketch is extremely space-efficient when serialized. 
+In an apples-to-apples empirical comparison against compressed HyperLogLog sketches, this new algorithm simultaneously wins on the two dimensions of the space/accuracy tradeoff and produces sketches that are smaller than the entropy of HLL, so no possible implementation of compressed HLL can match its space efficiency for a given accuracy. As described in the paper this sketch implements a newly developed ICON estimator algorithm that survives unioning operations, another well-known estimator, the Historical Inverse Probability (HIP) estimator does not. 
+The update speed performance of this sketch is quite fast and is comparable to the speed of HLL. 
+The unioning (merging) capability of this sketch also allows for merging of sketches with different configurations of K.
+For additional security this sketch can be configured with a user-specified hash seed.
+
+
+.. autoclass:: _datasketches.cpc_sketch
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize
+
+    .. rubric:: Static Methods:
+
+    .. automethod:: deserialize
+
+    .. rubric:: Non-static Methods:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: _datasketches.cpc_union
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize
+
+    .. automethod:: __init__
@@ -0,0 +1,55 @@
+HyperLogLog (HLL)
+-----------------
+This is a high performance implementation of Phillipe Flajolet's HLL sketch but with significantly improved error behavior.
+
+If the ONLY use case for sketching is counting uniques and merging, the HLL sketch is a reasonable choice, although the highest performing in terms of accuracy for storage space consumed is CPC (Compressed Probabilistic Counting). For large enough counts, this HLL version (with HLL_4) can be 2 to 16 times smaller than the Theta sketch family for the same accuracy.
+
+This implementation offers three different types of HLL sketch, each with different trade-offs with accuracy, space and performance. 
+These types are specified with the target_hll_type parameter.
+
+In terms of accuracy, all three types, for the same lg_config_k, have the same error distribution as a function of ``n``, the number of unique values fed to the sketch.
+The configuration parameter ``lg_config_k`` is the log-base-2 of ``k``, where ``k`` is the number of buckets or slots for the sketch.
+
+During warmup, when the sketch has only received a small number of unique items (up to about 10% of ``k``), this implementation leverages a new class of estimator algorithms with significantly better accuracy.
+
+
+.. autoclass:: _datasketches.tgt_hll_type
+
+    .. autoattribute:: HLL_4
+        :annotation: : 4 bits per entry
+
+    .. autoattribute:: HLL_6
+        :annotation: : 6 bits per entry
+
+    .. autoattribute:: HLL_8
+        :annotation: : 8 bits per entry
+
+
+.. autoclass:: _datasketches.hll_sketch
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize, get_max_updatable_serialization_bytes, get_rel_err 
+
+    .. rubric:: Static Methods:
+
+    .. automethod:: deserialize
+    .. automethod:: get_max_updatable_serialization_bytes
+    .. automethod:: get_rel_err
+
+    .. rubric:: Non-static Methods:
+
+    .. automethod:: __init__
+
+.. autoclass:: _datasketches.hll_union
+    :members:
+    :undoc-members:
+    :exclude-members: get_rel_err 
+
+    .. rubric:: Static Methods:
+
+    .. automethod:: get_rel_err
+
+    .. rubric:: Non-static Methods:
+
+    .. automethod:: __init__
+    
@@ -0,0 +1,24 @@
+Distinct Counting 
+=================
+
+.. currentmodule:: datasketches
+
+Distinct counting is one of the earliest tasks to which sketches were applied. The concept is simple:
+Provide an estimate of the number of unique elements in a set of data. One of the earliest solutions came
+from Flajolet and Martin in 1985 with their seminal work
+`Probabilistic counting Algorithms for Data Base Applications <http://db.cs.berkeley.edu/cs286/papers/flajoletmartin-jcss1985.pdf>`_.
+
+The DataSketches library offers several types of distinct counting sketches, each with different properties.
+
+  * :class:`hll_sketch`: Hyper Log Log, a well-known sketch for distinct counting but no longer state-of-the-art.
+  * :class:`cpc_sketch`: Provides a better accuracy-space trade-off than HLL, but with a somewhat larger footprint while in-memory.
+  * :class:`theta_sketch`: Theta sketch, a type of k-minimum value sketch, which provide good performance with intersection and set difference operations.
+  * :class:`tuple_sketch`: Tuple sketch, which is similar to a theta sketch but supports additional data stored with each key.
+
+.. toctree::
+  :maxdepth: 1
+   
+  hyper_log_log
+  cpc
+  theta 
+  tuple
@@ -0,0 +1,65 @@
+Theta Sketch
+------------
+
+.. currentmodule:: datasketches
+
+Theta sketches are used for distinct counting.
+
+The theta package contains the basic sketch classes that are members of the `Theta Sketch Framework <https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html>`_.
+There is a separate Tuple package for many of the sketches that are derived from the same algorithms defined in the Theta Sketch Framework paper.
+
+The *Theta Sketch* sketch is a space-efficient method for estimating cardinalities of sets.
+It can also easily handle set operations (such as union, intersection, difference) while maintaining good accuracy.
+Theta sketch is a practical variant of the K-Minimum Values sketch which avoids the need to sort the stored 
+hash values on every insertion to the sketch.
+It has better error properties than the HyperLogLog sketch for set operations beyond the simple union.
+
+Set operations (union, intersection, A-not-B) are performed through the use of dedicated objects.
+
+Several `Jaccard similarity <https://en.wikipedia.org/wiki/Jaccard_similarity>`_
+measures can be computed between theta sketches with the :class:`theta_jaccard_similarity` class.
+
+.. autoclass:: theta_sketch
+    :members:
+    :undoc-members:
+    
+.. autoclass:: update_theta_sketch
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: compact_theta_sketch
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize
+
+    .. rubric:: Static Methods:
+        
+    .. automethod:: deserialize
+
+    .. rubric:: Non-static Methods:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: theta_union
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: theta_intersection
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: theta_a_not_b
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
@@ -0,0 +1,64 @@
+Tuple Sketch
+------------
+
+.. currentmodule:: datasketches
+
+Tuple sketches are an extension of Theta sketches, meaning they provide estimate of distinct counts, that
+allow the keeping of arbitrary summaries associated with each retained key 
+(for example, a count for every key). The use of a :class:`tuple_sketch` requires a :class:`TuplePolicy` which
+defines how summaries are created, updated, merged, or intersected. The library provides a few basic 
+examples of :class:`TuplePolicy` implementations, but the right custom summary and policy can allow very
+complicated analysis to be performed quite easily.
+
+Set operations (union, intersection, A-not-B) are performed through the use of dedicated objects.
+
+Several `Jaccard similarity <https://en.wikipedia.org/wiki/Jaccard_similarity>`_
+measures can be computed between theta sketches with the :class:`tuple_jaccard_similarity` class.
+
+.. note::
+    Serializing and deserializing this sketch requires the use of a :class:`PyObjectSerDe`.
+
+.. autoclass:: tuple_sketch
+    :members:
+    :undoc-members:
+    
+.. autoclass:: update_tuple_sketch
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: compact_tuple_sketch
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize
+
+    .. rubric:: Static Methods:
+        
+    .. automethod:: deserialize
+
+    .. rubric:: Non-static Methods:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: tuple_union
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: tuple_intersection
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
+
+
+.. autoclass:: tuple_a_not_b
+    :members:
+    :undoc-members:
+
+    .. automethod:: __init__
@@ -0,0 +1,27 @@
+CountMin Sketch
+---------------
+
+The CountMin sketch, as described in Cormode and Muthukrishnan in
+http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf,
+is used for approximate Frequency Estimation. 
+For an item :math:`x` with frequency :math:`f_x`, the sketch provides an estimate, :math:`\hat{f_x}`, 
+such that :math:`f_x \approx \hat{f_x}.` 
+The sketch guarantees that :math:`f_x \le \hat{f_x}` and provides a probabilistic upper bound which is dependent on the size parameters.
+The sketch provides an estimate of the occurrence frequency for any queried item but, in contrast
+to the Frequent Items Sketch, this sketch does not provide a list of 
+heavy hitters.
+
+.. currentmodule:: _datasketches
+
+.. autoclass:: count_min_sketch
+    :members:
+    :undoc-members:
+    :exclude-members: deserialize, suggest_num_buckets, suggest_num_hashes
+
+    .. rubric:: Static Methods:
+
+    .. automethod:: deserialize
+    .. automethod:: suggest_num_buckets
+    .. automethod:: suggest_num_hashes
+
+    .. rubric:: Non-static Methods: