Skip to content

chore: tweak transient table data retention settings #15346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Apr 26, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Tweak transient table data retention settings

This PR introduces a new setting, transient_data_retention_time_in_minutes, to customize the retention period for transient table. This setting defines how long the historical data should be retained, with a default value of 60 minutes (i.e. 1 hour).

Additionally, when purging data from transient tables, the retention period specified by transient_data_retention_time_in_minutes will now be utilized.

Set transient_data_retention_time_in_minutes to 0 will "restore" the behavior of transient table before this PR.

  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

- introduce new setting `transient_data_retention_time_in_minutes`
  which set the retention period (in minutes) of transient table
- during purging data of transient table
  use the value of setting `transient_data_retention_time_in_minutes` in
table navigation
@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Apr 26, 2024
@dantengsky dantengsky force-pushed the chore-tweak-transient-settings branch from 6aedc87 to 66e66ac Compare April 26, 2024 13:29
@dantengsky dantengsky force-pushed the chore-tweak-transient-settings branch from 66e66ac to 730a84c Compare April 26, 2024 13:35
@cdmikechen
Copy link

@dantengsky
Hi~
I think this feature should be useful when doing real-time scenarios to avoid the growth of snapshot files.
What is the progress so far, please?

@dantengsky
Copy link
Member Author

@dantengsky Hi~ I think this feature should be useful when doing real-time scenarios to avoid the growth of snapshot files. What is the progress so far, please?

Thanks for asking!

This PR aims to use a more conservative (longer) retention period when purging history for transient tables, instead of the current value of "0". Once merged, this should mean that transient tables will keep more historical data by default than they do now.

Currently, the smallest unit for the retention period is a day, which is a bit too large for transient tables.


Right now, the way transient table purging taking a risk of corrupting the target table in scenarios with concurrent modifications (including append-only writes). Basically, it might purge data from pending transactions that might be successfully committed later.

Although this PR can mitigate the issue for now, it doesn't completely solve it. We need to further refine it (by checking the table's least visible timestamp at commit time) to fully fix the problem.

@sundy-li
Copy link
Member

sundy-li commented Jun 12, 2024

Why not save the settings into table option rather than a dynamic global setting.

create table t (c int) 
row_per_block = 100000
block_per_segment = 1000
data_retention_ttl_minutes = 600    --- this could be respected by vacuum command
recluster_schedule_interval = ..

...

@dantengsky
Copy link
Member Author

Why not save the settings into table option rather than a dynamic global setting. ....

Good idea, at least data_retention_time_in_days should be able to adjustable at table level (or inherit from db, account)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants