Flat requirements for whole genome mode are wasteful

When running a whole genome construct run, I have chromosomes 1-22, X, Y, and then a bunch of different little unplaced/unlocalized contigs and decoys.

We're using e.g. 200 GB of memory to compute snarls for each of those chromosomes, including all the tiny ones, but I don't observe them using nearly that much memory. It could be that chr1 takes that much memory, or even that a whole genome combined graph that we might ask to index takes that much memory, but chr21 doesn't, to say nothing of all the little unlocalized bits and decoys.

We should have some way to scale our job requirements based on file size, and/or we should run a test run with Toil's stat collection on and cut limits down to closer to what is really needed now with current vg.

This is currently causing me to waste most of our lab Kubernetes cluster capacity as unused-but-subscribed memory, and making my run very slow.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flat requirements for whole genome mode are wasteful #773

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flat requirements for whole genome mode are wasteful #773

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions