Skip to content

Flat requirements for whole genome mode are wasteful #773

@adamnovak

Description

@adamnovak

When running a whole genome construct run, I have chromosomes 1-22, X, Y, and then a bunch of different little unplaced/unlocalized contigs and decoys.

We're using e.g. 200 GB of memory to compute snarls for each of those chromosomes, including all the tiny ones, but I don't observe them using nearly that much memory. It could be that chr1 takes that much memory, or even that a whole genome combined graph that we might ask to index takes that much memory, but chr21 doesn't, to say nothing of all the little unlocalized bits and decoys.

We should have some way to scale our job requirements based on file size, and/or we should run a test run with Toil's stat collection on and cut limits down to closer to what is really needed now with current vg.

This is currently causing me to waste most of our lab Kubernetes cluster capacity as unused-but-subscribed memory, and making my run very slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions