-
Notifications
You must be signed in to change notification settings - Fork 938
Add support for splitting a chunk #7946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
780e89e
to
a58b0c3
Compare
5aefdae
to
db2c221
Compare
I want to add some concurrency/isolation tests. But might do it in another PR. |
a9d1f38
to
a2a0f84
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7946 +/- ##
==========================================
+ Coverage 80.06% 82.15% +2.08%
==========================================
Files 190 250 +60
Lines 37181 46504 +9323
Branches 9450 11670 +2220
==========================================
+ Hits 29770 38205 +8435
- Misses 2997 3671 +674
- Partials 4414 4628 +214 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Going to work on expanding tests to improve the code coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a few minor things that might be good to take a look at to make sure that they are not causing issues.
Looking for ways this could break, you might want to test these cases in the part of the test where you actually check the contents of the tuple (because there is potential for corrupting the data and printing out the tuples is more likely to reveal issues):
- No tuples in table and splitting
- Deleting and/or updating tuples in the table and splitting (you do vacuum as part of the split).
- Odd number of tuples in table and splitting
- Splitting table so that one part is empty and the other contains all of them (two cases, original is empty and new chunk is empty).
- Using split chunk with the start timestamp of the chunk
- Using split chunk with the end timestamp of the chunk
- Using both byval types (
int
, for example) and byref types (text
, for example). You are using byval types, but I see no byref types.
Hi! I wanted to ask how this works regarding compression. For example I have a database with a chunk size that's too large for the resources it has, 180 days. This chunk ends by the end of the year. Ideally I wouldn't want to wait until that chunk is done to compress it, and it seems this function would let me split it in two chunks of, say, 90 days, compress the first, adapt the chunk policy so it compresses and segments at 90 days, and all should work fine, right? This is our use case for splitting and merging. Basically fixing bad estimations for hypertable chunks in either way (or sometimes cases where compression works really well and we could merge several old compressed chunks for performance reasons). Does this splitting support both compressed and uncompressed chunks? |
Thanks for asking. The first version won't support compressed chunks, but this will be added in a follw-up PR. |
a644a88
to
4087068
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a test that splits into an empty new or empty old chunk. I did not see any such test.
A chunk can be split with a new procedure called split_chunk(). In this initial version, a chunk can only be split in two given a split point: ``` call split_chunk('chunk_1', split_at => '2025-03-01 00:00'); ``` If no split point is given, the chunk is split in two equal size partition ranges. The partitioning dimension/column to split along can also be specified, but only the primary dimension (time) is supported. Future updates to split_chunk() can add the ability to split also along other dimensions, e.g., space partitions. Currently, splitting a chunk takes an AccessExclusiveLock on the chunk being split. A lock is also taken on the hypertable root to prevent new chunks being created while the split is ongoing.
Can't support multi-dimensional splits because of limitation in substore space data structure.
Added. |
I see, thank you! |
A chunk can be split with a new procedure called split_chunk(). In this initial version, a chunk can only be split in two given a split point:
If no split point is given, the chunk is split in two equal size partition ranges. The partitioning dimension/column to split along can also be specified, but only the primary dimension (time) is supported. Future updates to split_chunk() can add the ability to split also along other dimensions, e.g., space partitions.
Currently, splitting a chunk takes an AccessExclusiveLock on the chunk being split. A lock is also taken on the hypertable root to prevent new chunks being created while the split is ongoing.