Fix IdentifyDuplicates to resolve NoneType mismatches by kjaisingh · Pull Request #788 · broadinstitute/gatk-sv

kjaisingh · 2025-03-12T17:56:39Z

Description

The IdentifyDuplicates task looks for a match across several INFO fields in order to determine exact or insert matches. However, these INFO fields may not exist for certain records, and hence are represented as None in the tuple that stores them. At a certain point, we sort these tuples.

As highlighted in Python documentation, when sorting a list of tuples, if there is a match on the search index within each tuple, the subsequent elements in the tuples are compared to define the sort order. This means that Python starts by comparing the first element of each tuple; if they are equal, it moves to the second element, and so on, until it finds elements that differ or reaches the end of the tuples.

Hence, if we reach an element in our sort process for which the INFO field exists in one record but not another, we get the following error: TypeError: '<' not supported between instances of 'str' and 'NoneType'. This PR is intended to avoid this by setting defaults for all INFO fields that may not exist in a record.

Testing

This Terra job includes a failed run that does not include this change.
This Terra job includes a successful run that does include this change.

epiercehoffman · 2025-03-12T19:45:41Z

-                record.info.get('STRANDS'),
-                record.info.get('CPX_TYPE'),
-                record.info.get('CPX_INTERVALS')
+                record.info.get('CHR2') or "",


Thanks for the fix. Can you give me access to the testing workspaces?

I would use the built-in .get(key, default) syntax for consistency across our scripts

Suggested change

record.info.get('CHR2') or "",

record.info.get('CHR2', ""),

Just shared the workspace with you, apologies for this oversight.

And thanks for the suggestion, just modified it to use this. Given the simplicity of the change, I did a quick sanity test locally, but let me know if I should re-build the docker / re-test with the .get(key, default) syntax as well.

epiercehoffman

LGTM. Thanks for the fix and for running the extra test!

…#788) * Initial commit * Removed trailing whitespace * Used .get(x, y) to standardize with repo

Initial commit

9a29442

kjaisingh self-assigned this Mar 12, 2025

kjaisingh added the bug Something isn't working label Mar 12, 2025

Removed trailing whitespace

d5eb91d

kjaisingh marked this pull request as ready for review March 12, 2025 18:47

kjaisingh requested a review from epiercehoffman March 12, 2025 18:47

epiercehoffman reviewed Mar 12, 2025

View reviewed changes

Used .get(x, y) to standardize with repo

979d0a2

epiercehoffman approved these changes Mar 13, 2025

View reviewed changes

kjaisingh merged commit 560975e into main Mar 13, 2025
8 checks passed

kjaisingh deleted the kj_fix_identify_duplicates_sorting branch March 13, 2025 16:28

MattWellie pushed a commit to populationgenomics/gatk-sv that referenced this pull request Apr 7, 2025

Fix IdentifyDuplicates to resolve NoneType mismatches (broadinstitute…

5258272

…#788) * Initial commit * Removed trailing whitespace * Used .get(x, y) to standardize with repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IdentifyDuplicates to resolve NoneType mismatches#788

Fix IdentifyDuplicates to resolve NoneType mismatches#788
kjaisingh merged 3 commits intomainfrom
kj_fix_identify_duplicates_sorting

kjaisingh commented Mar 12, 2025 •

edited

Loading

Uh oh!

epiercehoffman Mar 12, 2025

Uh oh!

kjaisingh Mar 12, 2025

Uh oh!

epiercehoffman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kjaisingh commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

epiercehoffman Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

kjaisingh Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

epiercehoffman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kjaisingh commented Mar 12, 2025 •

edited

Loading