Skip to content

Add coverage entry agg function#790

Merged
ch-kr merged 23 commits intomainfrom
kc/coverage_agg_func
Jul 18, 2025
Merged

Add coverage entry agg function#790
ch-kr merged 23 commits intomainfrom
kc/coverage_agg_func

Conversation

@ch-kr
Copy link
Copy Markdown
Contributor

@ch-kr ch-kr commented Jul 15, 2025

PR moves code from within compute_coverage_stats to new function get_coverage_agg_func to allow for more flexible use of compute_stats_per_ref_site (allows for a single call to compute_stats_per_ref_site to compute coverage, AN, and qual hists)

@ch-kr ch-kr requested a review from a team as a code owner July 15, 2025 18:01
@ch-kr ch-kr requested review from Copilot and removed request for a team July 16, 2025 16:00

This comment was marked as outdated.

ch-kr and others added 2 commits July 16, 2025 12:14
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I know this was basically a restructuring of the code but it would be good to add tests whenever we touch gnomad_methods. Cursor is pretty great at this after a couple goes in my experience.

@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 13:27

This comment was marked as outdated.

@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 15:07

This comment was marked as outdated.

…sts to use transofrmed DP, not raw, as it will be used in prod
@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 15:58

This comment was marked as outdated.

@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 16:04

This comment was marked as outdated.

@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 16:15

This comment was marked as outdated.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@ch-kr
Copy link
Copy Markdown
Contributor Author

ch-kr commented Jul 17, 2025

I completely forgot about adding tests, thank you for adding these Mike!

@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 17:35
@mike-w-wilson mike-w-wilson requested a review from Copilot July 17, 2025 17:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors coverage aggregation logic by extracting reusable functions from compute_coverage_stats. The change introduces get_coverage_agg_func to allow flexible use of coverage computation in compute_stats_per_ref_site for multiple statistics types (coverage, AN, and quality histograms).

  • Moves coverage transformation and aggregation logic from inline code to a new reusable function
  • Adds customizable dp_field parameter to support different depth field names
  • Refactors compute_coverage_stats to use the new function with minimal behavioral changes

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
gnomad/utils/sparse_mt.py Introduces get_coverage_agg_func and refactors compute_coverage_stats to use it
tests/utils/test_sparse_mt.py Comprehensive test suite for the new get_coverage_agg_func function
Comments suppressed due to low confidence (1)

tests/utils/test_sparse_mt.py:82

  • The test creates a NaN value using hl.float64("nan") but the DP field is typically an integer type. This test case may not accurately reflect real-world usage where DP fields are integers.
        test_struct = hl.Struct(DP=hl.float64("nan"))

Comment thread gnomad/utils/sparse_mt.py
Comment thread tests/utils/test_sparse_mt.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This comment was marked as outdated.

@ch-kr ch-kr requested a review from Copilot July 18, 2025 15:29

This comment was marked as outdated.

@ch-kr ch-kr requested a review from Copilot July 18, 2025 15:45

This comment was marked as outdated.

@ch-kr
Copy link
Copy Markdown
Contributor Author

ch-kr commented Jul 18, 2025

I think I'm happy with the tests now, back to you @mike-w-wilson !

@ch-kr ch-kr requested review from Copilot and mike-w-wilson July 18, 2025 15:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the coverage aggregation logic by extracting code from the compute_coverage_stats function into a new standalone get_coverage_agg_func function. This allows for more flexible reuse of coverage computation logic with compute_stats_per_ref_site, enabling a single call to compute multiple statistics (coverage, AN, and qual histograms) instead of separate calls.

  • Extracts coverage aggregation logic into get_coverage_agg_func function with configurable DP field name and max coverage bin
  • Updates compute_coverage_stats to use the new function instead of inline logic
  • Adds comprehensive test coverage for the new function including edge cases and parameter variations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
gnomad/utils/sparse_mt.py Adds new get_coverage_agg_func function and refactors compute_coverage_stats to use it
tests/utils/test_sparse_mt.py Adds comprehensive test suite for the new get_coverage_agg_func function

Comment thread gnomad/utils/sparse_mt.py
# Values 1, 2, 3 should be counted as themselves.
assert coverage_counter.get(1, 0) == 1
assert coverage_counter.get(2, 0) == 1
# Values 3-10 should all be counted as 3 (max_cov_bin).
Copy link

Copilot AI Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly states "Values 3-10" when it should be "Values 4-10" since value 3 is counted as itself, not capped.

Suggested change
# Values 3-10 should all be counted as 3 (max_cov_bin).
# Values 4-10 should all be counted as 3 (max_cov_bin).

Copilot uses AI. Check for mistakes.
Comment thread tests/utils/test_sparse_mt.py
Copy link
Copy Markdown
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- I like the addition of _ for the unused function - :shipit:

@ch-kr ch-kr merged commit f8aeca2 into main Jul 18, 2025
8 of 9 checks passed
@ch-kr ch-kr deleted the kc/coverage_agg_func branch July 18, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants