-
Notifications
You must be signed in to change notification settings - Fork 549
feat: allow passing a custom runtime into various DeltaOps
#3814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Abhi Agarwal <[email protected]>
|
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
DeltaOpsDeltaOps
|
We kind of have some parts of this running already already with let table = DeltaTableBuilder::from_url(blah).with_io_runtime(other_blah).build()?
let op = DeltaOps::from(table).create().etc().etc()That said, the operations aren't all using the IO runtime from the table 😒 so there's.. room for improvement! |
|
Yep, but I think it's kind of the opposite solution that I want. It's pretty common to run IO on the main tokio runtime and spawn a dedicated CPU runtime and use it sparingly. The |
|
@abhiaagarwal if you're going that approach, why not just spawn the tasks needing DeltaOps into that CPU intensive runtime outside of the API? Our APIs here are already messy, I'm trying to reel that back in whenever possible 🎣 |
|
I'm already doing that :) just figured I'd upstream my code to make it easier for anyone else. If you feel this is API surface bloat, feel free to close it! |
It's also common with datafusion to do CPU on main runtime and IO on separate, the with_io_runtime has a smaller api footprint then doing this per operation for cpu runtime |
Description
This is based on some code I've written at work to execute various
DeltaOpson a separate tokio runtime. I've noticed that this leads to fewer kubernetes kills due to the main tokio IO runtime not getting blocked on cpu-bound work , as well as reducing tail latencies.I've only done this to
CreateBuilderas a POC.Marked as a draft because there are a few open questions:
runtimeis set to ensure that it's using aSpawnedReqwestConnector?Related Issue(s)
Closes #3800
Documentation
Vendors this example from datafusion-examples: https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/thread_pools.rs