Add coverage entry agg function by ch-kr · Pull Request #790 · broadinstitute/gnomad_methods

ch-kr · 2025-07-15T18:01:18Z

PR moves code from within compute_coverage_stats to new function get_coverage_agg_func to allow for more flexible use of compute_stats_per_ref_site (allows for a single call to compute_stats_per_ref_site to compute coverage, AN, and qual hists)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mike-w-wilson

This LGTM. I know this was basically a restructuring of the code but it would be good to add tests whenever we touch gnomad_methods. Cursor is pretty great at this after a couple goes in my experience.

…sts to use transofrmed DP, not raw, as it will be used in prod

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ch-kr · 2025-07-17T17:35:15Z

I completely forgot about adding tests, thank you for adding these Mike!

Copilot

Pull Request Overview

This PR refactors coverage aggregation logic by extracting reusable functions from compute_coverage_stats. The change introduces get_coverage_agg_func to allow flexible use of coverage computation in compute_stats_per_ref_site for multiple statistics types (coverage, AN, and quality histograms).

Moves coverage transformation and aggregation logic from inline code to a new reusable function
Adds customizable dp_field parameter to support different depth field names
Refactors compute_coverage_stats to use the new function with minimal behavioral changes

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
gnomad/utils/sparse_mt.py	Introduces `get_coverage_agg_func` and refactors `compute_coverage_stats` to use it
tests/utils/test_sparse_mt.py	Comprehensive test suite for the new `get_coverage_agg_func` function

Comments suppressed due to low confidence (1)

tests/utils/test_sparse_mt.py:82

The test creates a NaN value using hl.float64("nan") but the DP field is typically an integer type. This test case may not accurately reflect real-world usage where DP fields are integers.

        test_struct = hl.Struct(DP=hl.float64("nan"))

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…call

ch-kr · 2025-07-18T15:46:05Z

I think I'm happy with the tests now, back to you @mike-w-wilson !

Copilot

Pull Request Overview

This PR refactors the coverage aggregation logic by extracting code from the compute_coverage_stats function into a new standalone get_coverage_agg_func function. This allows for more flexible reuse of coverage computation logic with compute_stats_per_ref_site, enabling a single call to compute multiple statistics (coverage, AN, and qual histograms) instead of separate calls.

Extracts coverage aggregation logic into get_coverage_agg_func function with configurable DP field name and max coverage bin
Updates compute_coverage_stats to use the new function instead of inline logic
Adds comprehensive test coverage for the new function including edge cases and parameter variations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
gnomad/utils/sparse_mt.py	Adds new `get_coverage_agg_func` function and refactors `compute_coverage_stats` to use it
tests/utils/test_sparse_mt.py	Adds comprehensive test suite for the new `get_coverage_agg_func` function

Copilot · 2025-07-18T15:46:45Z

+        # Values 1, 2, 3 should be counted as themselves.
+        assert coverage_counter.get(1, 0) == 1
+        assert coverage_counter.get(2, 0) == 1
+        # Values 3-10 should all be counted as 3 (max_cov_bin).


The comment incorrectly states "Values 3-10" when it should be "Values 4-10" since value 3 is counted as itself, not capped.

Suggested change

# Values 3-10 should all be counted as 3 (max_cov_bin).

# Values 4-10 should all be counted as 3 (max_cov_bin).

mike-w-wilson

Nice -- I like the addition of _ for the unused function -

ch-kr added 2 commits July 15, 2025 13:51

create get_coverage_agg_func

de4b374

add dp_field arg to compute_coverage_stats and add periods to docstring

680c312

ch-kr requested a review from a team as a code owner July 15, 2025 18:01

ch-kr added the Changelog: new feature label Jul 15, 2025

ch-kr requested review from Copilot and removed request for a team July 16, 2025 16:00

This comment was marked as outdated.

Sign in to view

ch-kr and others added 2 commits July 16, 2025 12:14

move dp_field per copilot suggestion

1613d0c

Update gnomad/utils/sparse_mt.py

ca3d4ff

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mike-w-wilson reviewed Jul 16, 2025

View reviewed changes

Add tests for get_coverage_agg_func

dddcc47

mike-w-wilson requested a review from Copilot July 17, 2025 13:27

This comment was marked as outdated.

Sign in to view

ch-kr assigned ch-kr and mike-w-wilson Jul 17, 2025

Remove print statements for debugging in test_sparse_mt

59a16d0

mike-w-wilson requested a review from Copilot July 17, 2025 15:07

This comment was marked as outdated.

Sign in to view

Switch to hail's NaN and remove hardocded expected value -- update te…

e134017

…sts to use transofrmed DP, not raw, as it will be used in prod

mike-w-wilson requested a review from Copilot July 17, 2025 15:58

This comment was marked as outdated.

Sign in to view

Remove pytest import and sample_ht as it wasn't being accessed

69322ed

mike-w-wilson requested a review from Copilot July 17, 2025 16:04

This comment was marked as outdated.

Sign in to view

Remove unneeded tests and misleading test names

eb4c34b

mike-w-wilson requested a review from Copilot July 17, 2025 16:15

This comment was marked as outdated.

Sign in to view

Update tests/utils/test_sparse_mt.py

3d54723

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mike-w-wilson requested a review from Copilot July 17, 2025 17:35

Copilot AI reviewed Jul 17, 2025

View reviewed changes

Comment thread gnomad/utils/sparse_mt.py

Comment thread tests/utils/test_sparse_mt.py Outdated

Update tests/utils/test_sparse_mt.py

430f84e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This comment was marked as outdated.

Sign in to view

mike-w-wilson and others added 8 commits July 18, 2025 09:56

Remove unneeded sparse_mt test

13284d6

asked cursor to add periods to comments

f910760

asked cursor to rename functions _ if they weren't being used

b7aa2d3

asked cursor to remove irrelevant comment

a38abff

asked cursor to rename function and remove unncessary max_cov_bin=50 …

dc87fed

…call

unused transform_func > _

26e8df2

asked cursor to remove redundant custom field name tests

36a7515

asked cursor to update test_transform_and_aggregation_integration

f948c77

ch-kr requested a review from Copilot July 18, 2025 15:29

This comment was marked as outdated.

Sign in to view

ch-kr added 3 commits July 18, 2025 11:39

improve median approx test documentation

63c775f

ask cursor to fix its comment 'accept actual behavior'

27b4525

ask cursor to fix vague > 0 assert

23059d5

ch-kr requested a review from Copilot July 18, 2025 15:45

This comment was marked as outdated.

Sign in to view

ch-kr requested review from Copilot and mike-w-wilson July 18, 2025 15:46

Copilot AI reviewed Jul 18, 2025

View reviewed changes

add link to hail docs

8cc1133

mike-w-wilson approved these changes Jul 18, 2025

View reviewed changes

ch-kr merged commit f8aeca2 into main Jul 18, 2025
8 of 9 checks passed

ch-kr deleted the kc/coverage_agg_func branch July 18, 2025 16:02

	# Values 3-10 should all be counted as 3 (max_cov_bin).
	# Values 4-10 should all be counted as 3 (max_cov_bin).

Conversation

ch-kr commented Jul 15, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

mike-w-wilson left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

ch-kr commented Jul 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

ch-kr commented Jul 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mike-w-wilson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants