Skip to content

Analysis subgraphs might be too big for Azul #50

Open
@hannes-ucsc

Description

@hannes-ucsc

Azul has 15 minutes and 10GiB of memory to index an analysis subgraph and all its stitched-on input subgraphs, aka the subgraph tree. So 15min and 10GiB for one subgraph tree. Those limits are imposed by AWS Lambda.

We hit the 10GiB memory limit with an existing SS2 analysis subgraph with 17000 files but were able to work around that by indexing it in "partitions". With partitioning we still load the entire subgraph tree from TDR but then only incorporate one slice of it (the partition) into the index. The partitions are then later combined in a separate process, outside of the original 15min/10GiB "box", in another "box" of the same size. We can make the partitions arbitrarily small. The problem is that we still have to load the entire subgraph tree from TDR before we can determine a partition to process. With partitioning enabled, the loading already eats up 10min of those 15min, leaving only 5min to process the partition. And this is where we'll hit the next limit: probably at 25k files, the entire 15min will be spent loading, leaving no time for the processing. The other problem is that we redundantly load many JSON documents multiple times (once per partition) before essentially discarding most of them.

The important observation is that no amount of subgraph partitioning by analysis, say by donor, is going to alleviate that problem, because all those partitions would still need to be stitched by Azul when it indexes the subgraph that combines the partitions into the final loom. The only thing that would help is if we didn't offer a top-level loom, only the donor-level looms. And that would obviously not be useful to our customers.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions