Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
mike-w-wilson
left a comment
There was a problem hiding this comment.
This LGTM. I know this was basically a restructuring of the code but it would be good to add tests whenever we touch gnomad_methods. Cursor is pretty great at this after a couple goes in my experience.
…sts to use transofrmed DP, not raw, as it will be used in prod
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I completely forgot about adding tests, thank you for adding these Mike! |
There was a problem hiding this comment.
Pull Request Overview
This PR refactors coverage aggregation logic by extracting reusable functions from compute_coverage_stats. The change introduces get_coverage_agg_func to allow flexible use of coverage computation in compute_stats_per_ref_site for multiple statistics types (coverage, AN, and quality histograms).
- Moves coverage transformation and aggregation logic from inline code to a new reusable function
- Adds customizable
dp_fieldparameter to support different depth field names - Refactors
compute_coverage_statsto use the new function with minimal behavioral changes
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| gnomad/utils/sparse_mt.py | Introduces get_coverage_agg_func and refactors compute_coverage_stats to use it |
| tests/utils/test_sparse_mt.py | Comprehensive test suite for the new get_coverage_agg_func function |
Comments suppressed due to low confidence (1)
tests/utils/test_sparse_mt.py:82
- The test creates a NaN value using
hl.float64("nan")but the DP field is typically an integer type. This test case may not accurately reflect real-world usage where DP fields are integers.
test_struct = hl.Struct(DP=hl.float64("nan"))
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I think I'm happy with the tests now, back to you @mike-w-wilson ! |
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the coverage aggregation logic by extracting code from the compute_coverage_stats function into a new standalone get_coverage_agg_func function. This allows for more flexible reuse of coverage computation logic with compute_stats_per_ref_site, enabling a single call to compute multiple statistics (coverage, AN, and qual histograms) instead of separate calls.
- Extracts coverage aggregation logic into
get_coverage_agg_funcfunction with configurable DP field name and max coverage bin - Updates
compute_coverage_statsto use the new function instead of inline logic - Adds comprehensive test coverage for the new function including edge cases and parameter variations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| gnomad/utils/sparse_mt.py | Adds new get_coverage_agg_func function and refactors compute_coverage_stats to use it |
| tests/utils/test_sparse_mt.py | Adds comprehensive test suite for the new get_coverage_agg_func function |
| # Values 1, 2, 3 should be counted as themselves. | ||
| assert coverage_counter.get(1, 0) == 1 | ||
| assert coverage_counter.get(2, 0) == 1 | ||
| # Values 3-10 should all be counted as 3 (max_cov_bin). |
There was a problem hiding this comment.
The comment incorrectly states "Values 3-10" when it should be "Values 4-10" since value 3 is counted as itself, not capped.
| # Values 3-10 should all be counted as 3 (max_cov_bin). | |
| # Values 4-10 should all be counted as 3 (max_cov_bin). |
mike-w-wilson
left a comment
There was a problem hiding this comment.
Nice -- I like the addition of _ for the unused function - ![]()
PR moves code from within
compute_coverage_statsto new functionget_coverage_agg_functo allow for more flexible use ofcompute_stats_per_ref_site(allows for a single call tocompute_stats_per_ref_siteto compute coverage, AN, and qual hists)