Code to run logistic regression on v4 exomes and genomes with ancesty pcs by KoalaQin · Pull Request #616 · broadinstitute/gnomad_qc

KoalaQin · 2024-05-22T02:02:46Z

No description provided.

…y PCs

jkgoodrich · 2024-05-25T01:30:24Z

gnomad_qc/v4/assessment/logistic_regression.py

+        ht = get_test_intervals(ht)
+        ht = ht.checkpoint(hl.utils.new_temp_file("test_intervals", "ht"))
+        exomes_vds = hl.vds.filter_intervals(
+            exomes_vds, ht, split_reference_blocks=True


Is this needed? Did the Hail team say this is faster than just filtering the variant matrix table and doing a densify like we do in this script? https://github.com/broadinstitute/gnomad_qc/blob/main/gnomad_qc/v4/sample_qc/generate_qc_mt.py

I didn't talk to Hail team, I tried filter_variants, I found it took very long to finish that step, then I remembered that once the vds has this store_max_ref_length, filter_intervals is much faster.

OK, sounds good, if it's faster then go for it

jkgoodrich · 2024-05-25T01:32:37Z

gnomad_qc/v4/assessment/logistic_regression.py

+    logger.info("Densifying exomes...")
+    exomes_mt = hl.vds.to_dense_mt(exomes_vds)
+    exomes_mt = exomes_mt.annotate_cols(is_genome=False)
+    exomes_mt = exomes_mt.select_entries("GT").select_rows().select_cols("is_genome")


If this is the only entry you need then you should filter to it before the densify. I think for this you probably also want to filter to only adj genotypes though, so you probably need more

I agree, adj seems to make more sense. I will get that.

I found the entries are not the same as what we used in getting freq for exomes or genomes, could you check the new steps for me?

jkgoodrich · 2024-05-25T01:35:29Z

gnomad_qc/v4/assessment/logistic_regression.py

+    return ht
+
+
+def densify_union_exomes_genomes(


I would split this up. I would run the exomes and genomes filter and densify in parallel, checkpoint each, and then union after those are done and checkpoint

jkgoodrich · 2024-05-25T01:38:26Z

gnomad_qc/v4/assessment/logistic_regression.py

+    :param joint_ht: Joint HT of v4 exomes and genomes.
+    :return: Test Table
+    """
+    # Filter to chr22


Before running the chr22 test, make an actual test that is only the first few partitions of chr22

Only a few partitions were slower when I tested, when I get the set of intervals on chr22, they are more partitioned and they were densified faster.

jkgoodrich · 2024-05-25T01:40:12Z

gnomad_qc/v4/assessment/logistic_regression.py

+        "firth",
+        y=mt.is_genome,
+        x=mt.GT.n_alt_alleles(),
+        covariates=[1] + [mt.pc[i] for i in range(10)],


I don't remember how many PCs we used for ancestry assignment off the top of my head, but I would use that number

Mike told me it was 10, but I do remember you're exploring until 18,19, I will double check the code.

Sorry if I wasnt clear, I wasnt certain it was 10, just thought it may be. Its 20: https://app.zenhub.com/workspaces/gnomad-5f4d127ea61afc001d6be50b/issues/gh/broadinstitute/gnomad_production/496

No worries, I put that 10 temporarily because it was just 10 in Julia's code. I changed it in my test.

jkgoodrich · 2024-05-25T01:40:45Z

gnomad_qc/v4/assessment/logistic_regression.py

+    return mt
+
+
+def main(args):


add a try catch for copying the logger in case it fails with an error

I added this.

KoalaQin · 2024-06-04T14:05:12Z

gnomad_qc/v4/assessment/logistic_regression.py

+        hl.utils.new_temp_file(f"temp_{data_type}_vds_filtered", "vds")
+    )
+    mt = hl.vds.to_dense_mt(vds)
+    mt = annotate_adj(mt)


After looking at your code, I think I could use filter_to_adj directly.

Code to run logistic regression on v4 exomes and genomes with ancestr…

071d4de

…y PCs

jkgoodrich self-requested a review May 23, 2024 13:46

jkgoodrich assigned jkgoodrich and KoalaQin May 23, 2024

jkgoodrich reviewed May 25, 2024

View reviewed changes

KoalaQin added 2 commits May 26, 2024 17:30

Merge branch 'main' into qh/logistic_regression

4b41e38

Address review suggestions

ad96374

KoalaQin commented Jun 4, 2024

View reviewed changes

Conversation

KoalaQin commented May 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KoalaQin May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KoalaQin May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KoalaQin May 25, 2024 •

edited

Loading

KoalaQin May 28, 2024 •

edited

Loading