-
|
Hi, We are using culsans in an ETL pipeline and currently discussing what's the best way going forward in terms of using the sync or async interface. Essentially the question is, what's the performance difference between using culsans async interface and having an async wrapper function pass the data to a sync function (either in a I understand there's a bit more behind the scenes like when passing data to to the I can provide more information on the usecase if needed but essentially we have an abstract base class where users should be able to define a The other option is, we let the user decide if they want a thread or process pool to be spawned in the background (handled in the base class) via a config option they pass. In that case their synchronous The former solution would use culsans async interface and pass the data to the users async function which then can forward the data to a thread/process pool and in the latter solution the sync interface would be called from within the thread/process pool owned by the base class and passed directly to the users function. Hope my explanation makes sense. P.S.: I guess this is a general design question and not purely focusing on culsans. We are still weighing the various options of how to design this system most effectively. Feel free to provide insight if you have any since it seems like you work a lot with asynchronous python. If it helps items through our pipeline will have sizes anywhere between a few bytes and gigabytes so passing data effectively for both small and big objects is important. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 17 replies
-
|
Could you please clarify the execution model for each solution? In particular, I would like to know the following:
|
Beta Was this translation helpful? Give feedback.
-
|
Thank you for the details. I think the most appropriate solution would be as follows:
In fact, running the sync However, I rejected this option for a very simple reason. What will the worker thread do while the data has not yet arrived? Right, it will reduce the concurrency set by the And if the user wants to use their own pools, they will choose "local" and, if necessary, will asynchronously wait for the future via |
Beta Was this translation helpful? Give feedback.
Yes, so in this way, all you need to do is:
pool.submit()/loop.run_in_executor(pool, ...)to transfer items to these pools.When using the pool, idle workers will not be a problem if you pass the appropriate
max_workers, so it is suitable for any scenario. Moreover, it dynamically starts threads/processes, so the scheduler will not suffer unnecessarily.At the moment, neither Culsans nor aiologic support inter-process communication. You can find out why aiologic does not support it at https://aiologic.readthedocs.io/latest/advanced-topics/libraries.html#why-is-multiprocessing-not-supported. Ho…