Skip to content

Asynchronous parallel submission with caching #119

@reesekneeland

Description

@reesekneeland

Hello!

One use case I find myself in frequently is wanting to submit multiple pieces of code for execution in parallel, and then collect all of their results in a main process once they all complete. The way I have been doing this now is with SubmitInfra, where I will asynchronously submit my jobs in a loop, and then have a second loop that iterates over all the job objects and calls .result(). This is effective, but precludes me from being able to use the caching functionality of exca since SubmitInfra does not support caching, leading to lots of re-computation in many cases.

I also understand that MapInfra is designed to complete a similar operation, however upon trying it out, I found myself not really understanding how it works and how to use it effectively, as when I submit say, 4 parallel jobs using it, it creates only one job in slurm that iterates over all of the 4 items I wanted to compute. It seems from the docs that MapInfra is designed to handle computing a large number of small operations (like computing embeddings over a pool of images) but perhaps not for my use case (submitting a relatively small number of large computations to be done in parallel).

If there is an aspect of MapInfra that I don't fully understand to achieve this goal, I would love to learn more. If not, my feature request is to allow for non-blocking asynchronous computation that has the ability to return previously cached results, either through adding cache functionality to SubmitInfra, or a non-blocking submission mode to TaskInfra.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions