SkipScan over compressed chunks #7983

natalya-aksman · 2025-04-18T13:53:06Z

PR implements https://github.com/timescale/eng-database/issues/693 via "SkipScan <- DecompressChunk <- Compressed Index" plan.

SkipScan consumes DecompressChunk data to get the next distinct value, then uses it to update SkipScan qual on the compressed index under DecompressChunk node.

Current unit test setup: skip_scan_load creates an extra table skip_scan_htc with the same setup as skip_scan_ht but with compressed data. Then the same correctness and plan tests run on skip_scan_ht are also run on skip_scan_htc. Correctness checks pass, plan tests only apply to scenarios with index on "dev".

TODO:

Create separate skip_scan_compressed unit test for tables with data compressed in various ways.
Create separate unit tests for distinct aggregates over compressed data.
Address SkipScan scenarios with DISTINCT on (dev), time ... ORDER BY dev, time on compressed data with segmentby=dev, orderby=time with ASC/DESC and NULLS LAST/FIRST for time.
Enable SkipScan for indexes with NULL direction not matching segmentby NULL direction if distinct column is guaranteed to be NOT NULL, for example if it's a distinct aggregate input, or there is a NOT NULL constraint on this column or NOT NULL predicate is pushed down into the index. Address it as a separate issue: [Enhancement]: Enable SkipScan for indexes with NULL direction not matching guaranteed not-NULL distinct column direction #7996
Improve cost model of SkipScan over compressed data

tsl/src/nodes/decompress_chunk/decompress_chunk.c

svenklemm · 2025-04-23T17:22:50Z

Have you done any benchmarking for this?

natalya-aksman · 2025-04-23T18:47:18Z

Have you done any benchmarking for this?

Not yet but existing SkipScan benchmark in tsbench should work as it already tests same queries on uncompressed vs. compressed data, so compressed data runs should speed up dramatically.

codecov · 2025-04-23T21:03:44Z

Codecov Report

Attention: Patch coverage is 94.24460% with 8 lines in your changes missing coverage. Please review.

Project coverage is 82.24%. Comparing base (59f50f2) to head (452baa4).
Report is 1009 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/nodes/skip_scan/planner.c	94.28%	1 Missing and 5 partials ⚠️
tsl/src/nodes/skip_scan/exec.c	93.10%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7983      +/-   ##
==========================================
+ Coverage   80.06%   82.24%   +2.17%     
==========================================
  Files         190      253      +63     
  Lines       37181    46896    +9715     
  Branches     9450    11807    +2357     
==========================================
+ Hits        29770    38570    +8800     
- Misses       2997     3652     +655     
- Partials     4414     4674     +260

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tsl/src/nodes/decompress_chunk/compressed_batch.c

tsl/src/nodes/skip_scan/planner.c

tsl/test/sql/include/skip_scan_load.sql

natalya-aksman · 2025-04-29T14:36:00Z

Have you done any benchmarking for this?

Done it now, it shows 2000x-2500x faster runs for SkipScan queries on ht_metrics_compressed, see https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-benchmark-runs?orgId=1&var-branch=All&var-run1=4324&var-run2=4327&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=true

src/guc.c

tsl/src/nodes/skip_scan/planner.c

SkipScan for compressed chunks, address PR comment, add code coverage SkipScan for compressed chunks, fix code format SkipScan for compressed chunks, improve SkipScan cost model SkipScan for compressed chunks, fix code format SkipScan for compressed chunks, fix compressed SkipScan cost model SkipScan for compressed chunks, fix unit test

@arajkumar

## 2.20.0 (2025-05-15) This release contains performance improvements and bug fixes since the 2.19.3 release. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.20.0** * The columnstore now leverages *bloom filters* to deliver up to 6x faster point queries on columns with high cardinality values, such as UUIDs. * Major *improvements to the columnstores' backfill process* enable `UPSERTS` with strict constraints to execute up to 10x faster. * *SkipScan is now supported in the columnstore*, including for DISTINCT queries. This enhancement leads to dramatic query performance improvements of 2000x to 2500x, especially for selective queries. * SIMD vectorization for the bool data type is now enabled by default. This change results in a 30–45% increase in performance for analytical queries with bool clauses on the columnstore. * *Continuous aggregates* now include experimental support for *window functions and non-immutable functions*, extending the analytics use cases they can solve. * Several quality-of-life improvements have been introduced: job names for continuous aggregates are now more descriptive, you can assign custom names to them, and it is now possible to add unique constraints along with `ADD COLUMN` operations in the columnstore. * Improved management and optimization of chunks with the ability to split large uncompressed chunks at a specified point in time using the `split_chunk` function. This new function complements the existing `merge_chunk` function that can be used to merge two small chunks into one larger chunk. * Enhancements to the default behavior of the columnstore now provide better *automatic assessments* of `segment by` and `order by` columns, reducing the need for manual configuration and simplifying initial setup. **PostgreSQL 14 support removal announcement** Following the deprecation announcement for PostgreSQL 14 in TimescaleDB v2.19.0, PostgreSQL 14 is no longer supported in TimescaleDB v2.20.0. The currently supported PostgreSQL major versions are 15, 16, and 17. **Features** * [#7638](#7638) Bloom filter sparse indexes for compressed columns. Can be disabled with the GUC `timescaledb.enable_sparse_index_bloom` * [#7756](#7756) Add warning for poor compression ratio * [#7762](#7762) Speed up the queries that use minmax sparse indexes on compressed tables by changing the index TOAST storage type to `MAIN`. This applies to newly compressed chunks * [#7785](#7785) Do `DELETE` instead of `TRUNCATE` when locks aren't acquired * [#7852](#7852) Allow creating foreign key constraints on compressed tables * [#7854](#7854) Remove support for PG14 * [#7864](#7854) Allow adding CHECK constraints to compressed chunks * [#7868](#7868) Allow adding columns with `CHECK` constraints to compressed chunks * [#7874](#7874) Support for SkipScan for distinct aggregates over the same column * [#7877](#7877) Remove blocker for unique constraints with `ADD COLUMN` * [#7878](#7878) Don't block non-immutable functions in continuous aggregates * [#7880](#7880) Add experimental support for window functions in continuous aggregates * [#7899](#7899) Vectorized decompression and filtering for boolean columns * [#7915](#7915) New option `refresh_newest_first` to continuous aggregate refresh policy API * [#7917](#7917) Remove `_timescaledb_functions.create_chunk_table` function * [#7929](#7929) Add `CREATE TABLE ... WITH` API for creating hypertables * [#7946](#7946) Add support for splitting a chunk * [#7958](#7958) Allow custom names for jobs * [#7972](#7972) Add vectorized filtering for constraint checking while backfilling into compressed chunks * [#7976](#7976) Include continuous aggregate name in jobs informational view * [#7977](#7977) Replace references to compression with columnstore * [#7981](#7981) Add columnstore as alias for `enable_columnstore `in `ALTER TABLE` * [#7983](#7983) Support for SkipScan over compressed data * [#7991](#7991) Improves default `segmentby` options * [#7992](#7992) Add API into hypertable invalidation log * [#8000](#8000) Add primary dimension info to information schema * [#8005](#8005) Support `ALTER TABLE SET (timescaledb.chunk_time_interval='1 day')` * [#8012](#8012) Add event triggers support on chunk creation * [#8014](#8014) Enable bool compression by default by setting `timescaledb.enable_bool_compression=true`. Note: for downgrading to `2.18` or earlier version, use [this downgrade script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) * [#8018](#8018) Add spin-lock during recompression on unique constraints * [#8026](#8026) Allow `WHERE` conditions that use nonvolatile functions to be pushed down to the compressed scan level. For example, conditions like `time > now()`, where `time` is a columnstore `orderby` column, will evaluate `now()` and use the sparse index on `time` to filter out the entire compressed batches that cannot contain matching rows. * [#8027](#8027) Add materialization invalidations API * [#8047](#8027) Support SkipScan for `SELECT DISTINCT` with multiple distincts when all but one distinct is pinned * [#8115](#8115) Add batch size limiting during compression **Bugfixes** * [#7862](#7862) Release cache pin when checking for `NOT NULL` * [#7909](#7909) Update compression stats when merging chunks * [#7928](#7928) Don't create a hypertable for implicitly published tables * [#7982](#7982) Fix crash in batch sort merge over eligible expressions * [#8008](#8008) Fix compression policy error message that shows number of successes * [#8031](#8031) Fix reporting of deleted tuples for direct batch delete * [#8033](#8033) Skip default `segmentby` if `orderby` is explicitly set * [#8061](#8061) Ensure settings for a compressed relation are found * [#7515](#7515) Add missing lock to Constraint-aware append * [#8067](#8067) Make sure hypercore TAM parent is vacuumed * [#8074](#8074) Fix memory leak in row compressor flush * [#8099](#8099) Block chunk merging on multi-dimensional hypertables * [#8106](#8106) Fix segfault when adding unique compression indexes to compressed chunks * [#8127](#8127) Read bit-packed version of booleans **GUCs** * `timescaledb.enable_sparse_index_bloom`: Enable creation of the bloom1 sparse index on compressed chunks; Default: `ON` * `timescaledb.compress_truncate_behaviour`: Defines how truncate behaves at the end of compression; Default: `truncate_only` * `timescaledb.enable_compression_ratio_warnings`: Enable warnings for poor compression ratio; Default: `ON` * `timescaledb.enable_event_triggers`: Enable event triggers for chunks creation; Default: `OFF` * `timescaledb.enable_cagg_window_functions`: Enable window functions in continuous aggregates; Default: `OFF` **Thanks** * @arajkumar for reporting that implicitly published tables were still able to create hypertables * @thotokraa for reporting an issue with unique expression indexes on compressed chunks --------- Signed-off-by: Philip Krauss <[email protected]> Signed-off-by: Ramon Guiu <[email protected]> Co-authored-by: Anastasiia Tovpeko <[email protected]> Co-authored-by: Ramon Guiu <[email protected]>

github-actions bot assigned natalya-aksman Apr 18, 2025

natalya-aksman added the skip-scan label Apr 18, 2025

natalya-aksman requested review from svenklemm, dbeck and akuzm April 22, 2025 19:59

svenklemm reviewed Apr 23, 2025

View reviewed changes

tsl/src/nodes/decompress_chunk/decompress_chunk.c Outdated Show resolved Hide resolved

svenklemm approved these changes Apr 23, 2025

View reviewed changes

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from d429df9 to f615695 Compare April 23, 2025 20:51

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from 26a2325 to afd17ff Compare April 24, 2025 15:14

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/decompress_chunk/compressed_batch.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 24, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch 2 times, most recently from 02f5b96 to 3a14699 Compare April 24, 2025 20:36

akuzm reviewed Apr 25, 2025

View reviewed changes

tsl/test/sql/include/skip_scan_load.sql Show resolved Hide resolved

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch 2 times, most recently from be41aca to 5f65f76 Compare April 28, 2025 22:02

natalya-aksman added this to the v2.20.0 milestone Apr 29, 2025

akuzm reviewed Apr 29, 2025

View reviewed changes

src/guc.c Outdated Show resolved Hide resolved

akuzm reviewed Apr 29, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm approved these changes Apr 29, 2025

View reviewed changes

akuzm reviewed Apr 30, 2025

View reviewed changes

tsl/src/nodes/skip_scan/planner.c Outdated Show resolved Hide resolved

akuzm mentioned this pull request Apr 30, 2025

Account for CPU tuple cost in DecompressChunk #7551

Merged

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from 5f65f76 to 3255787 Compare May 1, 2025 14:00

natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from 3255787 to 452baa4 Compare May 1, 2025 14:10

natalya-aksman merged commit 3bc1fb2 into timescale:main May 1, 2025
41 checks passed

This was referenced May 7, 2025

Changelog 2.20.0 #8068

Closed

changelog 2.20.0 #8124

Merged

bayandin mentioned this pull request May 15, 2025

timescaledb 2.20.0 bayandin/homebrew-tap#280

Closed

akuzm added the released-2.20.0 Released in 2.20.0 label May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SkipScan over compressed chunks #7983

SkipScan over compressed chunks #7983

Uh oh!

natalya-aksman commented Apr 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

svenklemm commented Apr 23, 2025

Uh oh!

natalya-aksman commented Apr 23, 2025

Uh oh!

codecov bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natalya-aksman commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SkipScan over compressed chunks #7983

SkipScan over compressed chunks #7983

Uh oh!

Conversation

natalya-aksman commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

svenklemm commented Apr 23, 2025

Uh oh!

natalya-aksman commented Apr 23, 2025

Uh oh!

codecov bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natalya-aksman commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natalya-aksman commented Apr 18, 2025 •

edited

Loading

codecov bot commented Apr 23, 2025 •

edited

Loading