Example for using a separate threadpool for CPU bound work (try 2) #14286

alamb · 2025-01-24T20:35:31Z

Note: This PR contains a (substantial) example and supporting code. It has no changes to the core.

Which issue does this PR close?

Closes Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work) #12393
Note this is new version version of Add example for using a separate threadpool for CPU bound work #13424

Rationale for this change

I have heard from multiple people multiple times that the specifics of using multiple threadpools for separate CPU and IO work in DataFusion is confusing.

However, this is a key detail for building high performance engines which process data directly from remote storage, which I think is a very important capability for DataFusion

My past attempts to make this example have been bogged down trying to get consensus on details of how to transfer results across streams, the wisdom of wrapping streams, and other details.

However, there have been no other actual alternatives proposed (aka no actual code that I could write an example with). So while the approach of wrapping streams may be a bit ugly, it is working for us in production at InfluxDB and I believe @adriangb said they are using the same strategy at Pydantic.

In my opinion, the community needs something that works, even if it is not optimial, which is this example.

I would personally like to merge it in so it is easier to find (and not locked in a PR) and iterate afterwards

What changes are included in this PR?

thread_pools.rs example
a dedicated_executor.rs module that

Note that the DedicatedExecutor code is orginally from

InfluxDB 3.0 (source link), largely written by @tustvold and @crepererum
The IoObjectStore wrapper is based on work from @matthewmturne in Add DedicatedExecutor to FlightSQL Server datafusion-contrib/datafusion-dft#247

Note that I have purposely avoided any changes to the DataFusion crates (such as adding DedicatedExecutor to physical-plan), but I have written the code with tests with an eye towards doing exactly such a thing one we have some experience / feedback on the approach. This is tracked by

[DISCUSSION] Add DedicatedExecutor into the DataFusion crates to make using multiple threadpools easier #14285

Right now, I think the most important thing is to get this example in for people to try and confirm if it helps them

Are these changes tested?

Yes the example is run as part of CI and there are tests

TODO: Verify that the tests in the examples run in CI

Are there any user-facing changes?

Not really

adriangb · 2025-01-25T11:34:58Z

I think this is great Andrew. For what it's worth if this were packaged up in some installable way (even if it had to be from git, etc.) I'm sure we'd be super happy to can our custom stuff and use this :)

alamb · 2025-01-25T13:40:54Z

I think this is great Andrew. For what it's worth if this were packaged up in some installable way (even if it had to be from git, etc.) I'm sure we'd be super happy to can our custom stuff and use this :)

Thank you 🙏 If this is the case I would be happy to make a PR to put DedicatedExecutor into the core. If had some evidence that people would actually use this I could make a more compelling story to put it into the core (despite the hesitations from @tustvold )

@adriangb Is there any way you can test / verify that the structures in this crate would work for you (like can you temporarily copy/paste dedicated_executor.rs into your tree?)

alamb · 2025-01-25T13:41:27Z

datafusion-examples/examples/thread_pools.rs

+/// Demonstrates running queries so that
+/// 1. IO operations happen on the current thread pool
+/// 2. CPU bound tasks happen on a different thread pool
+async fn different_runtime_advanced() -> Result<()> {


This is the example that shows the best practice for separating CPU and IO

alamb · 2025-01-25T13:43:07Z

datafusion-examples/examples/thread_pools.rs

+    //
+    // ctx.register_object_store(&base_url, http_store);
+    // A Tokio 1.x context was found, but timers are disabled. Call `enable_time` on the runtime builder to enable timers.
+    let http_store = dedicated_executor.wrap_object_store_for_io(http_store);


This pattern is @tustvold 's core concern as I understand -- that the overhead of transferring data back/forth between runtimes by wrapping streams could be avoided with lower level hooks for IO

adriangb · 2025-01-25T13:44:36Z

Yes I'll do just that and report back. I don't think it should block merging this as an example though.

I understand the hesitation to put it in core, but unless there's something better that's plug and play I think it's better to have this than nothing.

Could an alternative be to put it in a contrib crate or a new workspace package (so that CI also runs on it and it can have tests), add whatever APIs core needs to make it easier to plug in and document the setup? In my mind unless it's set up by default there isn't much more value in having it in core vs any other installable package.

alamb · 2025-01-25T13:43:50Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+// specific language governing permissions and limitations
+// under the License.
+
+//! [DedicatedExecutor] for running CPU-bound tasks on a separate tokio runtime.


this is a monster amount of code to put in an example -- it is a large amount of tests and documentation and I wrote it with an eye towards eventually putting it into datafusion itself

alamb · 2025-01-25T13:44:44Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+///
+/// [`DedicatedExecutor::spawn_io`] for more details
+#[derive(Debug)]
+pub struct IoObjectStore {


Here is the IoObjectStore that implements ObjectStore and wraps another ObjectStore that does IO on a different thread.

alamb · 2025-01-25T13:45:22Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+    }
+
+    // -----------------------------------
+    // ----- Tests for IoObjectStore ------


Here are tests that prove the IO is actually happening on the correct thread (both the original requests as well as the streams that are returned)

alamb · 2025-01-25T14:00:35Z

Is there anyone willing to test out this approach in their application so we can get some confirmation that it avoids some of the timeouts reported while doing heavy processing of data from object store?

Some people I think who have reported seeing this issue:

@djanderson I think you have seen issues - Add IoObjectStore that uses main runtime for network requests datafusion-contrib/datafusion-dft#248 (comment)
@rohitrastogi Add IoObjectStore that uses main runtime for network requests datafusion-contrib/datafusion-dft#248 (comment)
@ion-elgreco saw it on error decoding response body after upgrade to object store 0.10 arrow-rs-object-store#272
@JanKaul mentioned he saw something similar at the Amsterdam Meetup
@matthewmturner may have also seen it in dft

I am willing to help make a draft PR to whatever repo to hook DedicatedExecutor up, if someone is willing to then test on a case that was seeing object store timeouts before

Also FYI @ozankabak as we discussed this topic recently.

tustvold · 2025-01-25T14:00:59Z

In the interests of avoiding confusion, as my objections appear to have gotten a little misinterpreted, I'd like to clarify the fact this approach comes with non-trivial overheads is not what concerns me with this approach. Rather that we know from experience at InfluxData that this pattern is fragile, easy to mess up, and leads to emergent behaviour that is highly non-trivial to reproduce and debug.

That being said as Andrew says, nobody has emerged who is able/willing to resolve this with a more holistic approach, e.g. something closer to what polars/DuckDb/Hyper are doing to separate IO/compute, and so proceeding with something is better than nothing. I just hoped someone might step up and run with something along the lines of #13692 (comment)

ion-elgreco · 2025-01-25T14:43:16Z

Based on the previous discussions, and draft PRs, I ended up with this Object store wrapper to spawn the io tasks in a different handle: https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Fstorage%2Fmod.rs#L116-L124

I do like the approach proposed in this PR, and worthwhile to have it be part of core.

Regarding testing it, since this wrapper above was added, I haven't heard anyone mentioning these issues in delta-rs anymore outside of @Tom-Newton, did you end up resolving it on your end? I forgot to follow up on it

Slightly offtopic, but I wonder if there was anything done with surfacing the true error better? Because error decoding response body on its own is not clear enough if you don't have the context of why it can happen, I assume the body is incomplete because the request is timed out, if that's the case shouldn't that be the error message instead?

Tom-Newton · 2025-01-25T15:08:43Z

We are still having quite significant problems, but we can't reproduce it reliably and I haven't been personally working on it recently.

Anecdotally we think it's more frequent when:

Reading between between regions (e.g. compute in Azure US South Central reading Azure US East blob storage).
When spawning a large number of parallel jobs that all read the same thing using delta-rs.
Reading one particular table where we need to read a larger data volume. delta-rs via object-store is still only reading metadata though (order 100MB) and the larger data volumes loaded using the pyarrow Azure filesystem don't seem to suffer the same problem.

alamb · 2025-01-25T17:24:25Z

Based on the previous discussions, and draft PRs, I ended up with this Object store wrapper to spawn the io tasks in a different handle: https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Fstorage%2Fmod.rs#L116-L124

This is neat -- I actually like that it is in terms of just a tokio runtime rather than somehthing like DedicatedExecutor. I might try and work on IoObjectStore to follow that pattern instead

One thing I noticed is that it doesn't actually shim the get result stream (so while the intial GET request is on the other runtime, when reading data from the stream that will still happen on the same runtime)

I handled this by wrapping the stream like this:

    /// Wrap the result stream if necessary
    fn wrap_if_necessary(&self, payload: GetResultPayload) -> GetResultPayload {
        match payload {
            GetResultPayload::File(_, _) => payload,
            GetResultPayload::Stream(stream) => {
                let new_stream = self.dedicated_executor.run_io_stream(stream).boxed();
                GetResultPayload::Stream(new_stream)
            }
        }
    }

It might be an interesting thing to try in delta.rs 🤔

ion-elgreco · 2025-01-25T17:57:10Z

@alamb ah good point, I missed that! Definitely good to add, will have a better look at where these payload streams are collected

adriangb · 2025-01-26T17:17:57Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+                    .on_thread_start(move || {
+                        DedicatedExecutor::register_io_runtime(io_handle.clone())
+                    })


I'll note that this clobbers any on_thread_start set on runtime_builder. Unfortunately it is not stored publicly on RuntimeBuilder so it is not possible to pull off the existing setting and wrap it. I suggest we add an on_thread_start method to DedicatedExecutorBuilder?

It'd be nice if there was also a way to warn users that we are clobbering their configuration...

adriangb · 2025-01-26T17:48:34Z

I've successfully made a PR to integrate this internally. It was pretty straightforward. We'll have to scrutinize a bit to see if we can tell if anything is missing (this is very error prone code that can go bad in subtle ways) and if all goes well will report back after this is in production for a couple of days.

davidhewitt · 2025-01-27T11:18:56Z

This is neat -- I actually like that it is in terms of just a tokio runtime rather than somehthing like DedicatedExecutor. I might try and work on IoObjectStore to follow that pattern instead

I think that makes sense to me too. When I see the current IoObjectStore containing a DedicatedExecutor and then calling spawn_io, the abstraction feels muddled. Having just a handle to a runtime to spawn works on seems simpler and generalises better.

In the interests of avoiding confusion, as my objections appear to have gotten a little misinterpreted, I'd like to clarify the fact this approach comes with non-trivial overheads is not what concerns me with this approach. Rather that we know from experience at InfluxData that this pattern is fragile, easy to mess up, and leads to emergent behaviour that is highly non-trivial to reproduce and debug.

Just to put the brakes on rushing this into core too quickly, I have to support @tustvold in raising concern; we have had plenty of issues at Pydantic with this pattern where we have missed cases where we should have spawned IO (or CPU work) onto the other runtime.

It seems to me like most folks agree research into schedulers for the CPU work that can sit within the single tokio runtime would be much easier to integrate for downstream use cases, probably at the cost of complexity (overheads?) within datafusion itself.

We have already vendored this pattern and have it working for us, you don't need to rush this through for us.

JanKaul · 2025-01-27T15:54:35Z

This looks great. I'll try to reproduce my issue and try it out.

And I have to dig a bit deeper into the topic to understand the concerns about this approach.

rohitrastogi · 2025-01-28T01:41:46Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+
+#[async_trait]
+impl MultipartUpload for IoMultipartUpload {
+    fn put_part(&mut self, data: PutPayload) -> UploadPart {


In the S3 implementation of ObjectStore, put_part is 'tokio spawned' on the runtime of its caller (it uses a JoinSet and currently uses spawn and not spawn_on):
https://github.com/apache/arrow-rs/blob/6aaff7e38a573f797b31f89f869c3706cbe26e37/object_store/src/upload.rs#L211

If I understand the implementation and integration pattern correctly, this means that the put_part future for S3 will be spawned on the dedicated (CPU only) executor and not the IO runtime. We may have to patch the object store crate to get true IO/CPU isolation in this case.

I think this just needs to be wrapped in spawn_io, the overhead of the additional tokio task will be irrelevant when compared to the cross RT and IO overheads.

The fact this is so hard to spot is a good example of why this pattern is really fragile

I'm trying to wrap put_part with spawn_io, but I'm encountering some issues. For multipart uploads, I believe the part_idx needs to be incremented in the same order that PutPayload instances are generated from the stream being written. The indirection introduced by sending the put_part future to the I/O runtime via spawn_io seems to cause the payloads to be written out of order.

As a result, I'm seeing failures in complete() that indicate parts with a size smaller than 5MB are being sent before the last part.

I don't see actually anything in the standard S3MultipartUpload implementation that prevents this behavior, but I haven't encountered any failures related to this in the past, so I may be mistaken.

It shouldn't matter that they're written out of order as they're assembled based on the index captured when the future is created. Could you perhaps share your code?

Edit: I created an explicit test in object_store to show this - apache/arrow-rs#7047

Thank you, the test was very helpful and clearly illustrates that even though the async part uploads run concurrently, the synchronous part idx assignment happens serially on the same thread. I accidentally changed this behavior by wrapping inner in an arc mutex (to fix some borrow checker issues) and having all of inner put_parts() contend for the mutex, which broke the ordering guarantees. Fixed it.

@rohitrastogi , could you suggest the edit to this PR that resolved your issue with multipart uploads (or at least drop in a patch as a comment)? I'm just now realizing that the issue I'm seeing (#14286 (comment)) is the exact same that you were discussing... took me a while to make the connection, but now making sense that there is some threshold above which objectstore switches from a single "put" to a multipart, and that is exactly when I'm seeing the IO error, as you already reported.

JanKaul · 2025-01-28T19:23:43Z

For your information, I was able to reproduce the IO stall error with a very simple example. In case anybody is interested and wants to try it out. I will test the dedicated executor tomorrow.

alamb · 2025-01-28T20:14:10Z

It seems to me like most folks agree research into schedulers for the CPU work that can sit within the single tokio runtime would be much easier to integrate for downstream use cases, probably at the cost of complexity (overheads?) within datafusion itself.

@davidhewitt I agree in principle. I think any time one has to cross from sync code to async code the API is going to get challenging. The upside of the current DataFusion approach with a single tokio runtime is it just "works" (for some defintion of work)

The downside is that squeezing out the maximum performance is tricky

Though I would argue postponing the pain until you actually need really good network performance might be better than forcing users to sort it out upfront before they can even run simple things

Of course, ideally it would "just work" for users and they wouldn't have to worry about it at all

JanKaul · 2025-01-29T10:18:56Z

I tried the dedicated executor in my test example. I'm not entirely sure if I'm using it wrong or maybe my test is too contrived. But I'm still getting a single Timeout Error. This is already much better compared to when using a single runtime, which produces many TimeOut Errors.

I invite everybody to have a look, maybe you can spot a bug.

I will try the dedicated executor with a real example. The problem is that it is difficult to reproduce because the issue only occurred at a certain size.

tustvold · 2025-01-30T11:19:26Z

I took a look at your example @JanKaul, thank you for this as I think it very nicely demonstrates the challenge of shimming IO at the object store interface. Unfortunately I think this may be a different issue from the runtime stall issue.

At a high level the code is doing this

object_store.get().into_stream().map(|x| {
    sleep(Duration::from_seconds(2));
}).try_collect()

The problem is this introduces 2 second pauses between polling the streaming request from object storage. Regardless of runtime setup, holding a request across a long-running task will result in timeouts, as backpressure will eventually cause the sender to timeout. You would have the same issue if it were an "async" sleep, e.g. tokio::time::sleep.

There are two broad strategies to avoid this:

Add an unbounded queue on the stream output
Make separate requests each time for the work that can be performed, e.g. first 1MB, perform work, then fetch next MB

Option 1. will potentially buffer the entire input stream if the consumer is running behind, but should ensure the request runs to completion. Option 2. is I think the better approach, and is what code should probably be doing (FWIW this is what the parquet reader does).

As an aside I also took the time to port over the custom executor approach, to show what that looks like - JanKaul/cpu-io-executor#1. I still think this is a cleaner approach although, much like the DedicatedExecutor in this PR, it doesn't help with the issue this example is running into

JanKaul · 2025-01-30T12:50:39Z

Thanks a lot for having a look! I'll try to adopt the strategy 2 to simulate the behavior of the parquet reader.

djanderson · 2025-02-26T07:18:40Z

Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luck.

I now have the dedicated executor from this PR fully integrated and working in my executor starvation reproducer, however, I'm not observing a significant improvement. I see others mentioning that a similar approach has worked okay for them in the past, which makes me question what I'm seeing. I could really use another set of eyes on it.

I'm intentionally exercising the ingest path. It seems to more consistently exhibit the starvation issues, and the result of hitting the issue on ingest is potential data loss.... definitely something a database should avoid 😬

Does it show what I think it shows, or might it be some other phenomenon? If it shows 1) consistent executor starvation and 2) that this PR doesn't seem to improve it and 3) that we therefore have no working example as a community of how to ingest data into a server-deployed datafusion-based database, would we consider that a significant issue? Maybe even worth calling out as a priority on the roadmap?

As always, willing to keep testing with the threadpool approach or other suggestions. LMK.

alamb · 2025-02-26T17:31:47Z

Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luck.

I now have the dedicated executor from this PR fully integrated and working in my executor starvation reproducer, however, I'm not observing a significant improvement. I see others mentioning that a similar approach has worked okay for them in the past, which makes me question what I'm seeing. I could really use another set of eyes on it.

I'm intentionally exercising the ingest path. It seems to more consistently exhibit the starvation issues, and the result of hitting the issue on ingest is potential data loss.... definitely something a database should avoid 😬

Does it show what I think it shows, or might it be some other phenomenon? If it shows 1) consistent executor starvation and 2) that this PR doesn't seem to improve it and 3) that we therefore have no working example as a community of how to ingest data into a server-deployed datafusion-based database, would we consider that a significant issue? Maybe even worth calling out as a priority on the roadmap?

As always, willing to keep testing with the threadpool approach or other suggestions. LMK.

Thank you @djanderson -- I hadn't seen https://github.com/djanderson/parquet-sink-dedicated-exec-repro

I hope to get back to this issue in the next week or two and try and come up with the next steps based on all the feedback so far.

alamb · 2025-03-07T14:34:59Z

Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luck.

@djanderson -- I played around with your example / repro and I think you may be hitting a gRPC / flight client timeout (not related to what the server is doing). I wrote up my findings here

Report on timeout errors djanderson/parquet-sink-dedicated-exec-repro#1

alamb · 2025-03-07T17:12:17Z

I have also made a reproducer showing how timeouts happen with slower internet connections / certain files (because DataFusion is making single large requests which can't be completed within a timeout):

Timeouts reading "large" files from object stores over "slow" connections #15067

alamb · 2025-03-07T17:17:37Z

I have one more theory I want to chase down this afternoon and then I will write up my thoughts on next steps here

alamb · 2025-03-07T17:59:23Z

I have one more theory I want to chase down this afternoon and then I will write up my thoughts on next steps here

My other theory was that DataFusion might start requests but not consume them in time, thus leading to timeout errors even though the response was ready. However, I was not able to find any evidence that this was actually happening

adriangb · 2025-03-07T18:21:51Z

My other theory was that DataFusion might start requests but not consume them in time, thus leading to timeout errors even though the response was ready. However, I was not able to find any evidence that this was actually happening.

I'd expect precisely something like this if CPU is blocking IO tasks because the IO struggles to make progress and eventually times out. Couple that with retries and you can end up in scenarios where it's hard to tell what's causing what.

alamb · 2025-03-07T19:02:49Z

My other theory was that DataFusion might start requests but not consume them in time, thus leading to timeout errors even though the response was ready. However, I was not able to find any evidence that this was actually happening.

I'd expect precisely something like this if CPU is blocking IO tasks because the IO struggles to make progress and eventually times out. Couple that with retries and you can end up in scenarios where it's hard to tell what's causing what.

One thing that I noticed is that DataFusion uses ObjectStore::get_ranges which returns buffered Bytes -- so from the network perspective I think the data is being consumed 🤔

alamb · 2025-03-07T20:31:24Z

After reading all the relevant feedback on this PR here is my summary:

Some people have found it valuable to have an ObjectStore which does I/O on a different tokio Runtime
The DedicatedExecutor API seems overly complicated for what it is doing.

Thus my suggested next steps are:

For the near term / workaround, break out the object store wrapper to pass work on different runtimes . Perhaps based on the one in delta-rs
Update this example to that wrapper and use a tokio::Runtime directly rather than a DedicatedExecutor
Work on a more general purpose IO API (see discuss: Introduce datafusion-storage as datafusion's own storage interface #14854 from @Xuanwo) which might make it easier to get this right without the shim / spawn io

ion-elgreco · 2025-03-07T20:53:29Z

After reading all the relevant feedback on this PR here is my summary:

Some people have found it valuable to have an ObjectStore which does I/O on a different tokio Runtime

The DedicatedExecutor API seems overly complicated for what it is doing.

Thus my suggested next steps are:

For the near term / workaround, break out the object store wrapper to pass work on different runtimes . Perhaps based on the one in delta-rs

Update this example to that wrapper and use a tokio::Runtime directly rather than a DedicatedExecutor

Work on a more general purpose IO API (see discuss: Introduce datafusion-storage as datafusion's own storage interface #14854 from @Xuanwo) which might make it easier to get this right without the shim / spawn io

Sounds like a prime candidate for that object-store-utils crate :)

alamb · 2025-03-07T20:55:52Z

Sounds like a prime candidate for that object-store-utils crate :)

Yeah for sure. Now we just need to figure out where to put that (in apache or not 🤔 )

ion-elgreco · 2025-03-07T21:02:35Z

Sounds like a prime candidate for that object-store-utils crate :)

Yeah for sure. Now we just need to figure out where to put that (in apache or not 🤔 )

I think it can be beneficial in Apache but I don't know these processes tbh.

It would also be great to add caching object store implementations in there as well, such as this one https://github.com/slatedb/slatedb/blob/main/src%2Fcached_object_store%2Fobject_store.rs (or one of the other many out there :p)

alamb · 2025-03-08T10:11:37Z

Sounds like a prime candidate for that object-store-utils crate :)

Yeah for sure. Now we just need to figure out where to put that (in apache or not 🤔 )

I think it can be beneficial in Apache but I don't know these processes tbh.

It would also be great to add caching object store implementations in there as well, such as this one https://github.com/slatedb/slatedb/blob/main/src%2Fcached_object_store%2Fobject_store.rs (or one of the other many out there :p)

I filed a ticket for discussion about a crate with more object store utilities: [DISCUSSION] [object_store] New crate with object store combinators / utilitles arrow-rs-object-store#14

tustvold · 2025-03-08T12:24:37Z

FWIW the new HttpClient abstraction, introduced in ObjectStore 0.12, provides a potentially nicer way to spawn IO on to a separate runtime - apache/arrow-rs#7253

alamb · 2025-03-13T11:42:59Z

After thinking more about this more, what i am currently thinking is:

Wait until [ObjectStore] Add SpawnService for running requests on different tokio runtime/Handle arrow-rs#7253 is available in DataFusion (likely April)
Update this example to use that and another runtime directly (rather than dedicated executor, etc)

In parallel I think discussions on how to get some more optimized implemenations for object stores can proceed:

[DISCUSSION] [object_store] New crate with object store combinators / utilitles arrow-rs-object-store#14

alamb · 2025-03-18T14:48:25Z

Converting to a draft until we hav spawn service

[ObjectStore] Add SpawnService for running requests on different tokio runtime/Handle arrow-rs#7253

16pierre · 2025-05-13T11:55:51Z

datafusion-examples/examples/thread_pools.rs

+    //
+    // Note if you don't do this you will likely see a panic about `No IO runtime registered.`
+    // because the threads in the current (main) tokio runtime have not had the IO runtime
+    // installed


How much room is there for clients to mess this up silently, aka, how hard is it to introduce silent regressions in production where we no longer use the expected (correct) threadpool ?

In particular, it seems quite easy to mess up the client-side "stream driving" part.

This is the kind of regressions on which I'd want loud failures early in the development process, at CI-time for example. So here I'd find it desirable to have strengthened guarantees, for example, going from you will likely see a panic to you're **guaranteed** to see a panic

There is a lot of room for error, unfortunately

Thankfully, thanks to @tustvold and @ion-elgreco there is now a much better solution in ObjectStore 0.12.1 (just released today in fact)

apache/arrow-rs-object-store#332

If you have time to update this example to use that API that would be great. Otherwise I hope to do so over the next week or two

@alamb, while working on: delta-io/delta-rs#3426 to get it in delta-rs, I realized that on some object_stores (HDFS for example) you still might want to do a wrapper style because they don't have a HttpConnector to be configured

you still might want to do a wrapper style because they don't have a HttpConnector to be configured

Indeed, we also have some homemade ObjectStore implementations that don't leverage HTTP connectors.

I wonder if there's room for forcing functions like wrappers / utils that explicitly assert that the ObjectStore calls and the CPU-intensive stream driving calls are performed on their expected pools, to make things as foolproof as possible on the client side.

For example, I'd find it desirable to get closer to a pattern where clients can declare "use advanced split threadpools" as a one-time configuration, and then all subsequent accesses are forced to use the correct threadpools - even if we initially require some degree of client collaboration + rely on panics upon misuse, at least I'd want to put clients on rails.

I realized that on some object_stores (HDFS for example) you still might want to do a wrapper style because they don't have a HttpConnector to be configured

I don't really have a good answer to this, other than to update those stores to add a similar facility, or continue to use a wrapper and accept the downsides.

I'd find it desirable to get closer to a pattern where clients can declare "use advanced split threadpools" as a one-time configuration

I think everyone would like that, the challenge is a way to actually achieve this in practice in a way that doesn't involve boiling the ocean 😅. I'd personally love for DF to spawn CPU bound work into a pool it manages (#13692) but without someone to drive that initiative I expect it to never happen.

@tustvold @alamb We could move this wrapper implementation into object_store_utils crate, with a notation that this is only for stores that don't have a httpConnector capability, but ideally should implement this connector

It isn't entirely clear we've devised a formulation of the wrapper that doesn't have problems... I'd be much happier with seeing them gradually phased out...

W.r.t upstreaming given they largely exist to serve DF, and would likely tie into other DF functionality around wrapping streams, etc... it probably makes the most sense for them to live in DF, but that's ultimately a question for the DF maintainers.

the challenge is a way to actually achieve this in practice in a way that doesn't involve boiling the ocean

Agreed solving the full problem is non trivial. Looping back on my previous comment I wonder if there's room for forcing functions like wrappers..., after inspecting the influxdb3_core codebase (thanks for opensourcing btw), IOxSessionContext seems to roughly match what I had in mind here: by wrapping the DF session with a custom session with correctly configured threadpools, we can avoid exposing the raw DF session execution methods, and achieve forcing functions on our client callsites.

We'll probably mimick this pattern with a homemade wrapper; would be cool if such patterns were shipped out-of-the-box in DF.

would be cool if such patterns were shipped out-of-the-box in DF.

Agreed -- it would make a great example

MarcPerrier · 2025-05-20T13:21:44Z

datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs

+
+thread_local! {
+    /// Tokio runtime `Handle` for doing network (I/O) operations, see [`spawn_io`]
+    pub static IO_RUNTIME: RefCell<Option<Handle>> = const { RefCell::new(None) };


We are currently trying to implement a similar pattern and I wondered why the IO_RUNTIME is given through a thread local and not through an inner variable. Is it to have access to stronger encapsulation and/or useful behaviour (like spawn_io_static) or is there a correctness reason too ?

The reason it is a thread local / static is that the original location of this code (influxdb_iox) has several IO calls where we didn't have access to the DedicatedExecutor` and we needed to avoid plumbing a reference down all the way

If you can do it with an inner reference I think that is much better

alamb · 2025-05-28T17:00:05Z

BTW the SpawnService is what should be used now: apache/arrow-rs-object-store#332

Sadly, the docs are broken for the current version of object_store so I can't link the docs to it.

I'll try and find time to update this PR (or make a new one) with the API

alamb · 2025-06-08T16:01:53Z

Here is an updated version of the example, using the new ObjectStore API:

Example for using a separate threadpool for CPU bound work (try 3) #16331

Let's continue the conversation there

alamb mentioned this pull request Jan 24, 2025

[DISCUSSION] Add DedicatedExecutor into the DataFusion crates to make using multiple threadpools easier #14285

Open

Example for using a separate threadpool for CPU bound work (try 2)

b5d4ae1

alamb force-pushed the alamb/threadpool_example3 branch from df43a02 to b5d4ae1 Compare January 25, 2025 13:38

alamb commented Jan 25, 2025

View reviewed changes

alamb marked this pull request as ready for review January 25, 2025 13:45

adriangb reviewed Jan 26, 2025

View reviewed changes

rohitrastogi reviewed Jan 28, 2025

View reviewed changes

ion-elgreco mentioned this pull request Mar 7, 2025

to_pyarrow_table() on a table in S3 kept getting "Generic S3 error: error decoding response body" delta-io/delta-rs#2595

Open

alamb mentioned this pull request Mar 7, 2025

Timeouts reading "large" files from object stores over "slow" connections #15067

Open

alamb marked this pull request as draft March 18, 2025 14:48

alamb mentioned this pull request Mar 20, 2025

[DISCUSSION] [object_store] New crate with object store combinators / utilitles apache/arrow-rs-object-store#14

Open

linhr mentioned this pull request Apr 3, 2025

Run object store in a separate Tokio runtime lakehq/sail#432

Merged

16pierre reviewed May 13, 2025

View reviewed changes

This was referenced May 15, 2025

Weekly Plan: Andrew Lamb 2025-05-12 #16022

Closed

Weekly Plan: Andrew Lamb 2025-05-19 #16101

Closed

MarcPerrier reviewed May 20, 2025

View reviewed changes

alamb mentioned this pull request May 27, 2025

Weekly Plan: Andrew Lamb 2025-05-26 #16198

Closed

11 tasks

alamb mentioned this pull request Jun 8, 2025

Example for using a separate threadpool for CPU bound work (try 3) #16331

Merged

alamb closed this Jun 8, 2025

Example for using a separate threadpool for CPU bound work (try 2) #14286

Example for using a separate threadpool for CPU bound work (try 2) #14286

Uh oh!

Conversation

alamb commented Jan 24, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Jan 25, 2025

Uh oh!

alamb commented Jan 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jan 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ion-elgreco commented Jan 25, 2025

Uh oh!

Tom-Newton commented Jan 25, 2025

Uh oh!

alamb commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ion-elgreco commented Jan 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jan 26, 2025

Uh oh!

davidhewitt commented Jan 27, 2025

Uh oh!

JanKaul commented Jan 27, 2025

Uh oh!

rohitrastogi Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohitrastogi Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohitrastogi Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanKaul commented Jan 28, 2025

Uh oh!

alamb commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanKaul commented Jan 29, 2025

Uh oh!

tustvold commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

alamb commented Jan 25, 2025 •

edited

Loading

tustvold commented Jan 25, 2025 •

edited

Loading

alamb commented Jan 25, 2025 •

edited

Loading

rohitrastogi Jan 28, 2025 •

edited

Loading

rohitrastogi Jan 30, 2025 •

edited

Loading

tustvold Jan 30, 2025 •

edited

Loading

rohitrastogi Jan 31, 2025 •

edited

Loading

alamb commented Jan 28, 2025 •

edited

Loading

tustvold commented Jan 30, 2025 •

edited

Loading

16pierre May 13, 2025 •

edited

Loading

ion-elgreco May 13, 2025 •

edited

Loading