Skip to content

Add support for splitting a chunk #7946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Apr 10, 2025

A chunk can be split with a new procedure called split_chunk(). In this initial version, a chunk can only be split in two given a split point:

call split_chunk('chunk_1', split_at => '2025-03-01 00:00');

If no split point is given, the chunk is split in two equal size partition ranges. The partitioning dimension/column to split along can also be specified, but only the primary dimension (time) is supported. Future updates to split_chunk() can add the ability to split also along other dimensions, e.g., space partitions.

Currently, splitting a chunk takes an AccessExclusiveLock on the chunk being split. A lock is also taken on the hypertable root to prevent new chunks being created while the split is ongoing.

@erimatnor erimatnor force-pushed the split-chunk branch 2 times, most recently from 780e89e to a58b0c3 Compare April 10, 2025 13:26
@erimatnor erimatnor force-pushed the split-chunk branch 3 times, most recently from 5aefdae to db2c221 Compare April 10, 2025 13:37
@erimatnor erimatnor changed the title Add support for splitting chunks Add support for splitting a chunk Apr 10, 2025
@erimatnor
Copy link
Contributor Author

I want to add some concurrency/isolation tests. But might do it in another PR.

@erimatnor erimatnor force-pushed the split-chunk branch 2 times, most recently from a9d1f38 to a2a0f84 Compare April 10, 2025 13:56
Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 84.54545% with 34 lines in your changes missing coverage. Please review.

Project coverage is 82.15%. Comparing base (59f50f2) to head (2807d8a).
Report is 908 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/chunk.c 84.47% 9 Missing and 25 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7946      +/-   ##
==========================================
+ Coverage   80.06%   82.15%   +2.08%     
==========================================
  Files         190      250      +60     
  Lines       37181    46504    +9323     
  Branches     9450    11670    +2220     
==========================================
+ Hits        29770    38205    +8435     
- Misses       2997     3671     +674     
- Partials     4414     4628     +214     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@erimatnor erimatnor marked this pull request as ready for review April 10, 2025 15:14
@erimatnor
Copy link
Contributor Author

Going to work on expanding tests to improve the code coverage.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a few minor things that might be good to take a look at to make sure that they are not causing issues.

Looking for ways this could break, you might want to test these cases in the part of the test where you actually check the contents of the tuple (because there is potential for corrupting the data and printing out the tuples is more likely to reveal issues):

  • No tuples in table and splitting
  • Deleting and/or updating tuples in the table and splitting (you do vacuum as part of the split).
  • Odd number of tuples in table and splitting
  • Splitting table so that one part is empty and the other contains all of them (two cases, original is empty and new chunk is empty).
  • Using split chunk with the start timestamp of the chunk
  • Using split chunk with the end timestamp of the chunk
  • Using both byval types (int, for example) and byref types (text, for example). You are using byval types, but I see no byref types.

@fetchezar
Copy link

Hi! I wanted to ask how this works regarding compression. For example I have a database with a chunk size that's too large for the resources it has, 180 days. This chunk ends by the end of the year.

Ideally I wouldn't want to wait until that chunk is done to compress it, and it seems this function would let me split it in two chunks of, say, 90 days, compress the first, adapt the chunk policy so it compresses and segments at 90 days, and all should work fine, right?

This is our use case for splitting and merging. Basically fixing bad estimations for hypertable chunks in either way (or sometimes cases where compression works really well and we could merge several old compressed chunks for performance reasons).

Does this splitting support both compressed and uncompressed chunks?

@philkra philkra added this to the v2.20.0 milestone Apr 12, 2025
@erimatnor
Copy link
Contributor Author

Does this splitting support both compressed and uncompressed chunks?

Thanks for asking. The first version won't support compressed chunks, but this will be added in a follw-up PR.

@erimatnor erimatnor force-pushed the split-chunk branch 6 times, most recently from a644a88 to 4087068 Compare April 20, 2025 05:08
@erimatnor erimatnor requested a review from mkindahl April 21, 2025 15:06
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a test that splits into an empty new or empty old chunk. I did not see any such test.

A chunk can be split with a new procedure called split_chunk(). In
this initial version, a chunk can only be split in two given a split
point:

```
call split_chunk('chunk_1', split_at => '2025-03-01 00:00');
```

If no split point is given, the chunk is split in two equal size
partition ranges. The partitioning dimension/column to split along can
also be specified, but only the primary dimension (time) is
supported. Future updates to split_chunk() can add the ability to
split also along other dimensions, e.g., space partitions.

Currently, splitting a chunk takes an AccessExclusiveLock on the chunk
being split. A lock is also taken on the hypertable root to prevent
new chunks being created while the split is ongoing.
Can't support multi-dimensional splits because of limitation in
substore space data structure.
@erimatnor
Copy link
Contributor Author

Might be good to add a test that splits into an empty new or empty old chunk. I did not see any such test.

Added.

@fetchezar
Copy link

Does this splitting support both compressed and uncompressed chunks?

Thanks for asking. The first version won't support compressed chunks, but this will be added in a follw-up PR.

I see, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants