Skip to content

Latest commit

 

History

History
185 lines (144 loc) · 5.42 KB

hyperloglog.md

File metadata and controls

185 lines (144 loc) · 5.42 KB

Hyperloglog

Description
Details
API

Description

TimescaleDB Toolkit provides an implementation of the Hyperloglog estimator for COUNT DISTINCT approximations of any type that has a hash function.

Details

Timescale's HyperLogLog is implemented as an aggregate function in PostgreSQL. They do not support moving-aggregate mode, and are not ordered-set aggregates. It is restricted to values that have an extended hash function. They are partializable and are good candidates for continuous aggregation.

Command List (A-Z)


hyperloglog

hyperloglog(
    size INTEGER,
    value AnyElement¹
) RETURNS Hyperloglog

¹The type must have an extended (64bit) hash function.

This will construct and return a Hyperloglog with at least the specified number of buckets over the given values.

Required Arguments

Name Type Description
buckets INTEGER Number of buckets in the digest. Will be rounded up to the next power of 2, must be between 16 and 2^18. Increasing this will usually provide more accurate at the expense of more storage.
value AnyElement Column to count the distinct elements of.

Returns

Column Type Description
hyperloglog Hyperloglog A hyperloglog object which may be passed to other hyperloglog APIs.

Sample Usages

For this examples assume we have a table 'samples' with a column 'weights' holding DOUBLE PRECISION values. The following will simply return a digest over that column

SELECT hyperloglog(64, weights) FROM samples;

It may be more useful to build a view from the aggregate that we can later pass to other tdigest functions.

CREATE VIEW digest AS SELECT hyperloglog(64, data) FROM samples;

rollup

rollup(
    log hyperloglog
) RETURNS Hyperloglog

Returns a Hyperloglog by aggregating over the union of the input elements.

Required Arguments

Name Type Description
log Hyperloglog Column of Hyperloglogs to be unioned.

Returns

Column Type Description
hyperloglog Hyperloglog A hyperloglog containing the count of the union of the input Hyperloglogs.

Sample Usages

SELECT distinct_count(rollup(logs))
FROM (
    (SELECT hyperloglog(32, v::text) logs FROM generate_series(1, 100) v)
    UNION ALL
    (SELECT hyperloglog(32, v::text) FROM generate_series(50, 150) v)
) hll;
 count
-------
   153

distinct_count

distinct_count(hyperloglog Hyperloglog) RETURNS BIGINT

Get the number of distinct values from a hyperloglog.

Required Arguments

Name Type Description
hyperloglog Hyperloglog The hyperloglog to extract the count from.

Returns

Column Type Description
distinct_count BIGINT The number of distinct elements counted by the hyperloglog.

Sample Usages

SELECT distinct_count(hyperloglog(64, data))
FROM generate_series(1, 100) data
 distinct_count
----------------
            104

stderror

stderror(hyperloglog Hyperloglog) RETURNS DOUBLE PRECISION

Returns an estimate of the relative stderror of the hyperloglog based on the hyperloglog error formula. Approximate result are:

 precision ┃ registers ┃  error ┃  bytes
━━━━━━━━━━━╋━━━━━━━━━━━╋━━━━━━━━╋━━━━━━━━
         4 ┃        16 ┃ 0.2600 ┃     12
         5 ┃        32 ┃ 0.1838 ┃     24
         6 ┃        64 ┃ 0.1300 ┃     48
         7 ┃       128 ┃ 0.0919 ┃     96
         8 ┃       256 ┃ 0.0650 ┃    192
         9 ┃       512 ┃ 0.0460 ┃    384
        10 ┃      1024 ┃ 0.0325 ┃    768
        11 ┃      2048 ┃ 0.0230 ┃   1536
        12 ┃      4096 ┃ 0.0163 ┃   3072
        13 ┃      8192 ┃ 0.0115 ┃   6144
        14 ┃     16384 ┃ 0.0081 ┃  12288
        15 ┃     32768 ┃ 0.0057 ┃  24576
        16 ┃     65536 ┃ 0.0041 ┃  49152
        17 ┃    131072 ┃ 0.0029 ┃  98304
        18 ┃    262144 ┃ 0.0020 ┃ 196608

Required Arguments

Name Type Description
hyperloglog Hyperloglog The hyperloglog to extract the count from.

Returns

Column Type Description
stderror BIGINT The number of distinct elements counted by the hyperloglog.

Sample Usages

SELECT stderror(hyperloglog(64, data))
FROM generate_series(1, 100) data
 stderror
----------
     0.13