Skip to content

Consider removing combine hail table step #99

@rileyhgrant

Description

@rileyhgrant

Currently the four datasets are combined into a single hail table with one row for every gene, a struct for gene info containing info per dataset, and a struct for variants containing each dataset and its respective list of variants.

Then, when writing the tables, this entire table is written to a temp .tsv file, then the .tsv file is written to individual gene results, and variant results files per gene, one for each dataset.

Unless I am missing something, this combination into a single combined table does very little, given we validate the outputs from each of the individual pipelines. We could just generate the results json files from each individual dataset.


I was working on getting Transcript Consequences on a variant level for the IBD dataset, and after it working in development with a smaller subset of genes, it choked in production in the write results files steps, leading to looking a bit closer at whats happening and filing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions