KLL Streaming Quantile Sketch (Python)

Here’s a polished, stars-friendly README.md you can paste on your repo’s front page. I tailored it for your project and authorship.

KLL Streaming Quantile Sketch (Python)

Fast, mergeable KLL sketch for streaming quantiles — deterministic, zero deps, production-ready.

Author: Stamatis-Christos Saridakis

✨ Features

Accurate ε-quantiles under tight memory bounds (KLL algorithm)
Deterministic compaction (parity sampling) + weight conservation
Weighted ingestion via add(x, weight) for aggregated data
Mergeable sketches for distributed/parallel ingestion
Serializable (to_bytes / from_bytes)
Convenience helpers such as quantiles(m) and quantiles_at(qs) for evenly spaced or ad-hoc cuts
Zero dependencies, Python 3.9+ (self-hosted build backend)

🚀 Quick Start

from kll_sketch import KLL

sk = KLL(capacity=200)
sk.extend([1, 5, 2, 9, 3, 6, 4, 8, 7])

print("n =", sk.size())
print("median ≈", sk.median())
print("q(0.9) ≈", sk.quantile(0.9))
print("quartiles ≈", sk.quantiles(4))

Merge & Serialize

a, b = KLL(200), KLL(200)
a.extend(range(10_000))
b.extend(range(10_000, 20_000))

a.merge(b)            # in-place merge
blob = a.to_bytes()   # serialize
a2 = KLL.from_bytes(blob)
assert abs(a2.quantile(0.5) - a.quantile(0.5)) < 1e-12

🧰 API (minimal)

Method	Description
`add(x, weight=1)`	Ingest one value with optional weight.
`extend(xs)`	Ingest an iterable of values.
`size()`	Total number of ingested items `n`.
`quantile(q)`	Approximate `q`-quantile for `q∈[0,1]`.
`quantiles(m)`	Evenly spaced cut points.
`quantiles_at(qs)`	Batched quantiles for arbitrary `qs`.
`median()`	Convenience for `quantile(0.5)`.
`rank(x)`	Approximate rank of `x` in `[0, n]`.
`cdf(xs)`	CDF values for a sequence `xs`.
`merge(other)`	In-place merge with another sketch.
`to_bytes()` / `from_bytes()`	Serialize / deserialize.

📐 Theory (KLL in one minute)

This implementation follows Karnin–Lang–Liberty (2016): a space-optimal streaming algorithm for quantile approximation. Items are stored in levels; compaction randomly keeps one item from each pair and promotes it upward, doubling its weight. This achieves tight error bounds with O(k) space and amortized O(1) update cost.

Reference: Optimal Quantile Approximation in Streams, FOCS 2016.

📊 Accuracy & Performance

Typical error ≈ O(1/k) in rank space (increase capacity to tighten ε).
Updates amortized O(1) with occasional compactions.
Queries merge level buffers (k-way) and scan weights to the target rank. Use quantiles_at to answer multiple quantiles with a single scan.

Tip: For heavy query loads, cache materialized arrays between queries.

✅ Tests

Install the test dependencies and run the suite:

python -m pip install -r kll_sketch/requirements-test.txt
python -m pytest -q

🌐 Offline installation

The source distribution relies on setuptools and wheel as build-backend dependencies. When you install in an isolated or air-gapped environment you must provide those wheels yourself; otherwise the build frontend cannot bootstrap the backend.

On a machine with internet access, download the wheels you need:

python -m pip download --only-binary=:all: --dest ./wheelhouse \
    setuptools>=68 wheel
python -m pip download --only-binary=:all: --dest ./wheelhouse \
    kll-sketch

Transfer the wheelhouse/ directory to the offline environment.

Install using only the local wheels:

python -m pip install --no-index --find-links ./wheelhouse \
    setuptools wheel kll-sketch

If you build from a git checkout instead of a released wheel, vendor the backend wheels somewhere under version control (for example tools/vendor/) and point PIP_FIND_LINKS or your installer’s equivalent at that directory before running pip install ..

📈 Benchmarks

Get the optional tooling with extras:

python -m pip install -e .[bench,test]

Run the full synthetic sweep (matching the defaults in the docs):

python benchmarks/bench_kll.py \
  --module kll_sketch --class KLLSketch \
  --outdir bench_out \
  --Ns 1e5 1e6 \
  --capacities 200 400 800 \
  --distributions uniform normal exponential pareto bimodal \
  --qs 0.01 0.05 0.1 0.25 0.5 0.75 0.9 0.95 0.99 \
  --shards 8

Artifacts land in bench_out/ with the following schema:

accuracy.csv — distribution, N, capacity, mode (single/merged), quantile, estimate, exact, and absolute value error.
update_throughput.csv — distribution, N, capacity, update wall time (seconds), and computed inserts/sec.
query_latency.csv — distribution, N, capacity, quantile, and per-query latency in microseconds.
merge.csv — distribution, N, capacity, shard count, and merge wall time (seconds).

Visualise the outputs via benchmarks/bench_plots.ipynb, and read docs/benchmarks.md for a narrated walkthrough.

🛡️ Operations

For day-2 guidance—monitoring, alerting, capacity planning, and a step-by-step upgrade playbook—see the Operational Guide.

🗺️ Roadmap

Optional NumPy/C hot paths for sort/merge.

📝 License

Licensed under Apache-2.0.

🙌 Acknowledgments

Based on the KLL algorithm by Z. Karnin, E. Liberty, and L. Lang.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
kll_sketch		kll_sketch
tests/bench		tests/bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KLL Streaming Quantile Sketch (Python)

✨ Features

🚀 Quick Start

Merge & Serialize

🧰 API (minimal)

📐 Theory (KLL in one minute)

📊 Accuracy & Performance

✅ Tests

🌐 Offline installation

📈 Benchmarks

🛡️ Operations

🗺️ Roadmap

📝 License

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

SaridakisStamatisChristos/kll_sketch

Folders and files

Latest commit

History

Repository files navigation

KLL Streaming Quantile Sketch (Python)

✨ Features

🚀 Quick Start

Merge & Serialize

🧰 API (minimal)

📐 Theory (KLL in one minute)

📊 Accuracy & Performance

✅ Tests

🌐 Offline installation

📈 Benchmarks

🛡️ Operations

🗺️ Roadmap

📝 License

🙌 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages