Consider removing combine hail table step

Currently the four datasets are combined into a single hail table with one row for every gene, a struct for gene info containing info per dataset, and a struct for variants containing each dataset and its respective list of variants.

Then, when writing the tables, this entire table is written to a temp .tsv file, then the .tsv file is written to individual gene results, and variant results files per gene, one for each dataset.

Unless I am missing something, this combination into a single combined table does very little, given we validate the outputs from each of the individual pipelines. We could just generate the results json files from each individual dataset.

---

I was working on getting Transcript Consequences on a variant level for the IBD dataset, and after it working in development with a smaller subset of genes, it choked in production in the write results files steps, leading to looking a bit closer at whats happening and filing this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider removing combine hail table step #99

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider removing combine hail table step #99

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions