Support fan-out jobs with arbitrary number of inputs

**Jira:** https://asfdaac.atlassian.net/browse/TOOL-3621

*Note: The above link is accessible only to members of ASF.*

----------------------------------------------------------------------------------------------------


There are some challenges to supporting fan-out/fan-in jobs with an arbitrary number of inputs (i.e. granules).

For one, throughput is limited by the maximum number of parallel iterations for the step function's Map state. The Map state defaults to Inline mode, which means it only accepts the input list as a JSON array and only runs 40 parallel iterations according to the docs. However, we can use Distributed mode, which allows the Map state to accept input as a CSV or JSON file in S3 (or an S3 object list) and runs up to 10,000 parallel iterations. For [SRG_TIME_SERIES](https://github.com/ASFHyP3/hyp3/blob/develop/job_spec/SRG_TIME_SERIES.yml) in particular, this would allow us to take full advantage of the G-instance vCPU quota in the LAVAS account. Also see:

- https://states-language.net/#map-state
- https://docs.aws.amazon.com/step-functions/latest/dg/state-map.html
- https://docs.aws.amazon.com/step-functions/latest/dg/state-map-inline.html
- https://docs.aws.amazon.com/step-functions/latest/dg/state-map-distributed.html
- https://aws.amazon.com/blogs/compute/introducing-jsonl-support-with-step-functions-distributed-map/

There are also [Step Functions service quotas](https://docs.aws.amazon.com/step-functions/latest/dg/service-quotas.html) that we may bump up against depending on the number of inputs we want to support; we've already addressed one of these in [HyP3 v10.0.1](https://github.com/ASFHyP3/hyp3/releases/tag/v10.0.1), but there are other opportunities for reducing the size of the JSON data being passed between states, such as no longer passing around the job parameters (which include the full granules list) after they’re no longer needed, and writing the `processing_times` list (which includes one float value for each Batch job) directly to the database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support fan-out jobs with arbitrary number of inputs #2722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support fan-out jobs with arbitrary number of inputs #2722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions