Asynchronous parallel submission with caching

Hello!

One use case I find myself in frequently is wanting to submit multiple pieces of code for execution in parallel, and then collect all of their results in a main process once they all complete. The way I have been doing this now is with SubmitInfra, where I will asynchronously submit my jobs in a loop, and then have a second loop that iterates over all the job objects and calls .result(). This is effective, but precludes me from being able to use the caching functionality of exca since SubmitInfra does not support caching, leading to lots of re-computation in many cases.

I also understand that MapInfra is designed to complete a similar operation, however upon trying it out, I found myself not really understanding how it works and how to use it effectively, as when I submit say, 4 parallel jobs using it, it creates only one job in slurm that iterates over all of the 4 items I wanted to compute. It seems from the docs that MapInfra is designed to handle computing a large number of small operations (like computing embeddings over a pool of images) but perhaps not for my use case (submitting a relatively small number of large computations to be done in parallel).

If there is an aspect of MapInfra that I don't fully understand to achieve this goal, I would love to learn more. If not, my feature request is to allow for non-blocking asynchronous computation that has the ability to return previously cached results, either through adding cache functionality to SubmitInfra, or a non-blocking submission mode to TaskInfra.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous parallel submission with caching #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Asynchronous parallel submission with caching #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions