Skip to content

SkipScan over compressed chunks #7983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

natalya-aksman
Copy link
Contributor

@natalya-aksman natalya-aksman commented Apr 18, 2025

PR implements https://github.com/timescale/eng-database/issues/693 via "SkipScan <- DecompressChunk <- Compressed Index" plan.

SkipScan consumes DecompressChunk data to get the next distinct value, then uses it to update SkipScan qual on the compressed index under DecompressChunk node.
DecompressChunk plan has "UniqueSegBy" flag set when it is consumed by SkipScan. When this flag is set, it jumps to the end of the current batch after the 1st tuple from the batch is retrieved.

Current unit test setup: skip_scan_load creates an extra table skip_scan_htc with the same setup as skip_scan_ht but with compressed data. Then the same correctness and plan tests run on skip_scan_ht are also run on skip_scan_htc. Correctness checks pass, plan tests only apply to scenarios with index on "dev".

TODO:

  • Create separate skip_scan_compressed unit test for tables with data compressed in various ways.
  • Create separate unit tests for distinct aggregates over compressed data.
  • Address SkipScan scenarios with DISTINCT on (dev), time ... ORDER BY dev, time on compressed data with segmentby=dev, orderby=time with ASC/DESC and NULLS LAST/FIRST for time.
  • Enable SkipScan for indexes with NULL direction not matching segmentby NULL direction if distinct column is guaranteed to be NOT NULL, for example if it's a distinct aggregate input, or there is a NOT NULL constraint on this column or NOT NULL predicate is pushed down into the index. Address it as a separate issue: [Enhancement]: Enable SkipScan for indexes with NULL direction not matching guaranteed not-NULL distinct column direction #7996

@svenklemm
Copy link
Member

Have you done any benchmarking for this?

@natalya-aksman
Copy link
Contributor Author

Have you done any benchmarking for this?

Not yet but existing SkipScan benchmark in tsbench should work as it already tests same queries on uncompressed vs. compressed data, so compressed data runs should speed up dramatically.

@natalya-aksman natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from d429df9 to f615695 Compare April 23, 2025 20:51
Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 95.28302% with 5 lines in your changes missing coverage. Please review.

Project coverage is 82.23%. Comparing base (59f50f2) to head (3a14699).
Report is 917 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/nodes/skip_scan/planner.c 96.00% 1 Missing and 2 partials ⚠️
tsl/src/nodes/skip_scan/exec.c 91.30% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7983      +/-   ##
==========================================
+ Coverage   80.06%   82.23%   +2.16%     
==========================================
  Files         190      252      +62     
  Lines       37181    46567    +9386     
  Branches     9450    11708    +2258     
==========================================
+ Hits        29770    38295    +8525     
- Misses       2997     3633     +636     
- Partials     4414     4639     +225     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@natalya-aksman natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from 26a2325 to afd17ff Compare April 24, 2025 15:14
Comment on lines +1247 to +1324
if (dcontext->unique_segmentby)
{
batch_state->next_batch_row = batch_state->total_batch_rows - 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use compressed_batch_discard_tuples at the caller level instead? I'd avoid special cases at this level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be specific, it should be called at the same moment as skip_scan_rescan_index, maybe inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed_batch_discard_tuples doesn't work here, as we need to return a tuple which we've just obtained.
Basically we treat unique segmentby condition as a predicate which will skip the rest of the batch after we get first valid tuple.
After we return this tuple we will update skip_qual with this tuple value so that the next call will fetch compressed index tuple with a new segmentby value.

If we have to use compressed_batch_discard_tuples at the caller level we will need to save the first returned tuple before discarding tuples I guess?

Copy link
Member

@akuzm akuzm Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine it goes like this:

  1. skip_scan_update_key
  2. return current tuple to caller
  3. skip_scan_rescan_index -- this is happening at our next ExecProcNode call, the caller has processed our previous tuple from step (2), so safe to destroy it. compressed_batch_discard_tuples is called here.

Would that work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably have to save the TupleTableSlot at stage (2) for that, right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually works differently for SkipScan execution, see https://github.com/timescale/timescaledb/blob/main/tsl/src/nodes/skip_scan/exec.c#L241.
The logic is doing this loop until there are no more tuples:

  1. SkipScan exec state fetches a tuple from the child exec state via ExecProcNode call.
  2. It uses this tuple result to skip_scan_update_key.

So skip_scan_update_key is called after ExecProcNode call is done and result is returned. If result is empty we are done.

In case of compressed data, ExecProcNode calls DecompressChunk which either calls ExecProcNode to fetch a new compressed batch or advances the current batch. As soon as DecompressChunk fetches a tuple SkipScan can do skip_scan_update_key. So we can discard tuples at some point here I guess?

Copy link
Contributor Author

@natalya-aksman natalya-aksman Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. so it works like this:

  1. SkipScan calls ExecProcNode which calls DecompressChunk.

  2. DecompressChunk fetches compressed batch tuple from compressed index, and returns a decompressed tuple via ExecProcNode as soon as it gets decompressed tuple passing the qual.

  3. SkipScan sees returned tuple from DecompressChunk. It uses the returned tuple to skip_scan_update_key and also to return this tuple upstream. After that it can call compressed_batch_discard_tuples I think, as we need to stop processing the current compressed batch and fetch a new one via DecompressChunk with the updated SkipQual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something like this:

static void
skip_scan_rescan_index(SkipScanState *state)
{
	/* if the scan in the child scan has not been
	 * setup yet which is true before the first tuple
	 * has been retrieved from child scan we cannot
	 * trigger rescan but since the child scan
	 * has not been initialized it will pick up
	 * any ScanKey changes we did */
	if (*state->scan_desc)
	{
		index_rescan(*state->scan_desc,
					 *state->scan_keys,
					 *state->num_scan_keys,
					 NULL /*orderbys*/,
					 0 /*norderbys*/);

		void *child = linitial(state->cscan_state.custom_ps);
		if (IsA(child, CustomScanState))
		{
			DecompressChunkState *ds = (DecompressChunkState *) child;
			TupleTableSlot *slot = ds->batch_queue->funcs->top_tuple(ds->batch_queue);
			if (slot)
			{
				compressed_batch_discard_tuples((DecompressBatchState *) slot);
			}
		}
	}

	state->needs_rescan = false;
}

This passes all the tests w/o the DecompressContext.unique_segmentby. It is kind of ugly, maybe it'll be more elegant to save the last returned child tuple in the SkipScanState. You'll still need the check for whether the child plan is a DecompressChunk, because unfortunately there's no way to tell from TupleTableSlot if it's actually a DecompressBatchState or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at it in the debugger, to fully understand how PG execution works here. I see that it makes more sense to do it this way i.e. from the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see how it works now. I think it's OK to get DecompressBatchState slot this way as SkipScanState won't need to keep the slot in any scenarios but over compressed chunks.

@natalya-aksman natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from afd17ff to 02f5b96 Compare April 24, 2025 20:28
SkipScan for compressed chunks, address PR comment, add code coverage

SkipScan for compressed chunks, fix code format
@natalya-aksman natalya-aksman force-pushed the skipscan_for_compressed_data_i693 branch from 02f5b96 to 3a14699 Compare April 24, 2025 20:36
@@ -34,6 +34,22 @@ ANALYZE skip_scan_ht;

ALTER TABLE skip_scan_ht SET (timescaledb.compress,timescaledb.compress_orderby='time desc', timescaledb.compress_segmentby='dev');

-- create compressed hypertable with different physical layouts in the chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way, only the layout of the uncompressed chunk tables differs. The compressed chunk table is re-created at the moment of compression, so if you compress after all alters, they all have the same layout. Might be good to compress between the alters to affect the compressed chunk tables as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's easy to change in this test, or that it's particularly important, so just for your information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will definitely try it to test compressed chunks properly in the same way as uncompressed chunks are tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I would keep compressed chunk attnos out of sync with uncompressed chunk attnos in at least some of the chunks, so compress one step later or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to reformat the tests to run tests with one compression set in skip_scan_load and then run the rest of tests with different compressions in a separate file. Will try that.

Copy link
Contributor Author

@natalya-aksman natalya-aksman Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another issue with compressing chunk and then inserting data: adjusting statistics to num_distinct = 1 row. If I do compress_chunk as the data is inserted into a table and then do statistics and adjust it, num_distinct=4 rows (as it remembers 4 chunks) which affects costs on some plans so that they don't choose SkipScan.

I will keep the current tests as is and then will add an extra test with compressing chunks as the data is inserted and then run limited tests for which num_distinct = 4 won't flip the plan. I checked the correctness with the current tests and it works OK.

I will also look closer into costing skipscan over compressed chunks as I think the current formula doesn't do it correctly if a child of SkipScan is DecompressChunk rather than Index Scan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants