-
Notifications
You must be signed in to change notification settings - Fork 938
SkipScan over compressed chunks #7983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SkipScan over compressed chunks #7983
Conversation
Have you done any benchmarking for this? |
Not yet but existing SkipScan benchmark in tsbench should work as it already tests same queries on uncompressed vs. compressed data, so compressed data runs should speed up dramatically. |
d429df9
to
f615695
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7983 +/- ##
==========================================
+ Coverage 80.06% 82.23% +2.16%
==========================================
Files 190 252 +62
Lines 37181 46567 +9386
Branches 9450 11708 +2258
==========================================
+ Hits 29770 38295 +8525
- Misses 2997 3633 +636
- Partials 4414 4639 +225 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
26a2325
to
afd17ff
Compare
if (dcontext->unique_segmentby) | ||
{ | ||
batch_state->next_batch_row = batch_state->total_batch_rows - 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use compressed_batch_discard_tuples
at the caller level instead? I'd avoid special cases at this level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be specific, it should be called at the same moment as skip_scan_rescan_index
, maybe inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed_batch_discard_tuples
doesn't work here, as we need to return a tuple which we've just obtained.
Basically we treat unique segmentby condition as a predicate which will skip the rest of the batch after we get first valid tuple.
After we return this tuple we will update skip_qual with this tuple value so that the next call will fetch compressed index tuple with a new segmentby value.
If we have to use compressed_batch_discard_tuples
at the caller level we will need to save the first returned tuple before discarding tuples I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine it goes like this:
skip_scan_update_key
- return current tuple to caller
skip_scan_rescan_index
-- this is happening at our next ExecProcNode call, the caller has processed our previous tuple from step (2), so safe to destroy it.compressed_batch_discard_tuples
is called here.
Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably have to save the TupleTableSlot
at stage (2) for that, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually works differently for SkipScan execution, see https://github.com/timescale/timescaledb/blob/main/tsl/src/nodes/skip_scan/exec.c#L241.
The logic is doing this loop until there are no more tuples:
- SkipScan exec state fetches a tuple from the child exec state via
ExecProcNode
call. - It uses this tuple result to
skip_scan_update_key
.
So skip_scan_update_key
is called after ExecProcNode
call is done and result is returned. If result is empty we are done.
In case of compressed data, ExecProcNode
calls DecompressChunk
which either calls ExecProcNode
to fetch a new compressed batch or advances the current batch. As soon as DecompressChunk
fetches a tuple SkipScan can do skip_scan_update_key
. So we can discard tuples at some point here I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. so it works like this:
-
SkipScan calls ExecProcNode which calls DecompressChunk.
-
DecompressChunk
fetches compressed batch tuple from compressed index, and returns a decompressed tuple viaExecProcNode
as soon as it gets decompressed tuple passing the qual. -
SkipScan sees returned tuple from
DecompressChunk
. It uses the returned tuple toskip_scan_update_key
and also to return this tuple upstream. After that it can callcompressed_batch_discard_tuples
I think, as we need to stop processing the current compressed batch and fetch a new one viaDecompressChunk
with the updated SkipQual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something like this:
static void
skip_scan_rescan_index(SkipScanState *state)
{
/* if the scan in the child scan has not been
* setup yet which is true before the first tuple
* has been retrieved from child scan we cannot
* trigger rescan but since the child scan
* has not been initialized it will pick up
* any ScanKey changes we did */
if (*state->scan_desc)
{
index_rescan(*state->scan_desc,
*state->scan_keys,
*state->num_scan_keys,
NULL /*orderbys*/,
0 /*norderbys*/);
void *child = linitial(state->cscan_state.custom_ps);
if (IsA(child, CustomScanState))
{
DecompressChunkState *ds = (DecompressChunkState *) child;
TupleTableSlot *slot = ds->batch_queue->funcs->top_tuple(ds->batch_queue);
if (slot)
{
compressed_batch_discard_tuples((DecompressBatchState *) slot);
}
}
}
state->needs_rescan = false;
}
This passes all the tests w/o the DecompressContext.unique_segmentby
. It is kind of ugly, maybe it'll be more elegant to save the last returned child tuple in the SkipScanState. You'll still need the check for whether the child plan is a DecompressChunk, because unfortunately there's no way to tell from TupleTableSlot if it's actually a DecompressBatchState or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at it in the debugger, to fully understand how PG execution works here. I see that it makes more sense to do it this way i.e. from the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see how it works now. I think it's OK to get DecompressBatchState slot this way as SkipScanState won't need to keep the slot in any scenarios but over compressed chunks.
afd17ff
to
02f5b96
Compare
SkipScan for compressed chunks, address PR comment, add code coverage SkipScan for compressed chunks, fix code format
02f5b96
to
3a14699
Compare
@@ -34,6 +34,22 @@ ANALYZE skip_scan_ht; | |||
|
|||
ALTER TABLE skip_scan_ht SET (timescaledb.compress,timescaledb.compress_orderby='time desc', timescaledb.compress_segmentby='dev'); | |||
|
|||
-- create compressed hypertable with different physical layouts in the chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way, only the layout of the uncompressed chunk tables differs. The compressed chunk table is re-created at the moment of compression, so if you compress after all alters, they all have the same layout. Might be good to compress between the alters to affect the compressed chunk tables as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's easy to change in this test, or that it's particularly important, so just for your information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will definitely try it to test compressed chunks properly in the same way as uncompressed chunks are tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I would keep compressed chunk attnos out of sync with uncompressed chunk attnos in at least some of the chunks, so compress one step later or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to reformat the tests to run tests with one compression set in skip_scan_load and then run the rest of tests with different compressions in a separate file. Will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another issue with compressing chunk and then inserting data: adjusting statistics to num_distinct = 1 row. If I do compress_chunk as the data is inserted into a table and then do statistics and adjust it, num_distinct=4 rows (as it remembers 4 chunks) which affects costs on some plans so that they don't choose SkipScan.
I will keep the current tests as is and then will add an extra test with compressing chunks as the data is inserted and then run limited tests for which num_distinct = 4 won't flip the plan. I checked the correctness with the current tests and it works OK.
I will also look closer into costing skipscan over compressed chunks as I think the current formula doesn't do it correctly if a child of SkipScan is DecompressChunk rather than Index Scan.
PR implements https://github.com/timescale/eng-database/issues/693 via "SkipScan <- DecompressChunk <- Compressed Index" plan.
SkipScan consumes DecompressChunk data to get the next distinct value, then uses it to update SkipScan qual on the compressed index under DecompressChunk node.
DecompressChunk plan has "UniqueSegBy" flag set when it is consumed by SkipScan. When this flag is set, it jumps to the end of the current batch after the 1st tuple from the batch is retrieved.
Current unit test setup:
skip_scan_load
creates an extra tableskip_scan_htc
with the same setup asskip_scan_ht
but with compressed data. Then the same correctness and plan tests run onskip_scan_ht
are also run onskip_scan_htc
. Correctness checks pass, plan tests only apply to scenarios with index on "dev".TODO:
skip_scan_compressed
unit test for tables with data compressed in various ways.DISTINCT on (dev), time ... ORDER BY dev, time
on compressed data withsegmentby=dev, orderby=time
with ASC/DESC and NULLS LAST/FIRST fortime
.segmentby
NULL direction if distinct column is guaranteed to be NOT NULL, for example if it's a distinct aggregate input, or there is a NOT NULL constraint on this column or NOT NULL predicate is pushed down into the index. Address it as a separate issue: [Enhancement]: Enable SkipScan for indexes with NULL direction not matching guaranteed not-NULL distinct column direction #7996