-
Notifications
You must be signed in to change notification settings - Fork 772
chore: add merge_into_resize_parallel_threads settings #15048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: add merge_into_resize_parallel_threads settings #15048
Conversation
@@ -498,6 +498,12 @@ impl DefaultSettings { | |||
mode: SettingMode::Both, | |||
range: Some(SettingRange::Numeric(0..=1)), | |||
}), | |||
("merge_into_resize_parallel_threads", DefaultSettingValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this value should be adjust based on CPUs not a setting, we actually don't know what's the value to set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we have changed the adjust_io_request
multiple times, returning to the simplest method in the end:
https://github.com/datafuselabs/databend/blob/4a55ea429a148a11ca365983976abc63363e4ae5/src/query/storages/fuse/src/operations/read_data.rs#L82-L92
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/datafuselabs/databend/pull/15048/files#diff-31afd04d842c6c44f5b86a534efe227d1bd19c04b8dbad24e4cfdae42f33300dR448 see here, the default is cpu threads, and we won't change the previous query time, it will change the merge into i/o pipeline. I need to modify the merge into to make it more tunable. It's to control the block write threads.In fact to reduce blocks writes time, not the i/o times at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the i/o reuqust is used to control the number of i/o at the same time, but it can't control the whole number of complete pipeline, we need to control the block compact and reduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is this setting only for our testing not the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pr need to check, But in default, it won't make effect, I and @wubx find out the i/o is very high. So we need this. Let me do a cloud test with him.
Docker Image for PR
|
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
if the machine has too many cores, maybe we will write too many small block, which cause high i/o cost.
Tests
Type of change
This change is