-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Set DataFusion runtime configurations through SQL interface #15594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Set DataFusion runtime configurations through SQL interface #15594
Conversation
3d743e5
to
bfb1d0e
Compare
bfb1d0e
to
8d894be
Compare
Hello @kumarlokesh. Thank you for working on this. I have 2 questions/concerns. Let's discuss on them a bit to get a future-proof design
|
8d894be
to
fa34e62
Compare
bae6d2f
to
08d038c
Compare
@berkaysynnada Thank you for the feedback!
|
let ctx = SessionContext::new(); | ||
|
||
// Set memory limit to 100MB using SQL - note the quotes around the value | ||
ctx.sql("SET datafusion.runtime.memory_limit = '100M'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this test can't ensure this config is set, because it can run without setting the memory limit and use the default UnboundedMemoryPool
, instead I think it should assert the spill count for a query that has spilled some intermediate data.
SET datafusion.runtime.memory_limit = '1M'
set datafusion.execution.sort_spill_reservation_bytes = 0;
select * from generate_series(1, 100000) as t1(v1) order by v1;
-- And assert spill-count from the query is > 0
You can check this
let spill_count = metrics.spill_count().unwrap(); |
Later after all configurations are added, I think we should make this test case stronger by setting more runtime configs, and do some property test to ensure all of them are properly set.
BTW, I tried the above test locally, and the generate_series
UDTF seems not registered in the PR branch, but it works in the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this test can't ensure this config is set, because it can run without setting the memory limit and use the default
UnboundedMemoryPool
, instead I think it should assert the spill count for a query that has spilled some intermediate data.SET datafusion.runtime.memory_limit = '1M' set datafusion.execution.sort_spill_reservation_bytes = 0; select * from generate_series(1, 100000) as t1(v1) order by v1; -- And assert spill-count from the query is > 0
You can check this
let spill_count = metrics.spill_count().unwrap(); for how to assert the spill file count.
Later after all configurations are added, I think we should make this test case stronger by setting more runtime configs, and do some property test to ensure all of them are properly set.BTW, I tried the above test locally, and the
generate_series
UDTF seems not registered in the PR branch, but it works in the main branch.
@2010YOUY01 @berkaysynnada addressed above in 6501f21.
|
||
/// Parse memory limit from string to number of bytes | ||
/// e.g. '1.5G', '100M' | ||
fn parse_memory_limit(&self, limit: &str) -> Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can open this function to external uses?
Thank you @kumarlokesh for addressing my comments. I don't have further suggestions or concern other than @2010YOUY01 shared |
08d038c
to
6501f21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I plan to leave this PR open for 2 days more before merging, in case anyone has additional comments.
|
||
let mut state = self.state.write(); | ||
let mut builder = | ||
RuntimeEnvBuilder::from_runtime_env(state.runtime_env()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is all cloning Arc
s inside RuntimeEnv
, I think it shouldn't have any issue.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
SET datafusion.runtime.memory_limit = '100M'
.Are these changes tested?
Are there any user-facing changes?