Sanitize non-finite matcher costs before Hungarian assignment#787
Sanitize non-finite matcher costs before Hungarian assignment#787
Conversation
Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>
Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>
Codecov Report❌ Patch coverage is ❌ Your project check has failed because the head coverage (73%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #787 +/- ##
======================================
Coverage 73% 73%
======================================
Files 69 69
Lines 8149 8162 +13
======================================
+ Hits 5965 5976 +11
- Misses 2184 2186 +2 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR hardens HungarianMatcher against NaN/Inf entries in the cost matrix so SciPy’s linear_sum_assignment doesn’t raise ValueError: matrix contains invalid numeric entries during training.
Changes:
- Sanitize non-finite entries in the matcher cost matrix using only finite costs as reference, with a deterministic finite fallback when all entries are non-finite.
- Ensure replacement costs are strictly larger than any valid finite cost so invalid pairs are deprioritized.
- Add regression tests covering both
NaNandInfcost contamination cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/rfdetr/models/matcher.py |
Replaces the previous C.max()-based cleanup with finite-only sanitization and a safe fallback when all entries are non-finite. |
tests/models/test_matcher.py |
Adds a parametrized regression test ensuring matching succeeds and selects the valid query/target pair when non-finite costs are present. |
You can also share your feedback on Copilot code review. Take the survey.
- Add logger.warning() when non-finite values are detected in the cost matrix so numerical instability surfaces early during training - Add -inf as a third parametrize case alongside nan and inf - Split all-nonfinite test into three focused assertions - Add regression test for negative-cost + NaN (Bug 2: max_cost*2 amplification) - Add batch_size>1 parametrized test exercising the C.split loop - Extract matcher and standard_target fixtures to reduce duplication - Wrap all tests in TestHungarianMatcherNonFiniteCosts class Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
| logger.warning( | ||
| "Non-finite values detected in matcher cost matrix; " | ||
| "replacing with finite sentinel. " | ||
| "Check for numerical instability." | ||
| ) |
There was a problem hiding this comment.
This logger.warning(...) is in the matcher forward pass (runs every training step) and will emit once per batch whenever non-finite costs occur, potentially spamming logs (especially under DDP, where each rank logs). Consider throttling (e.g., warn once per process / once per epoch) or gating behind is_main_process() if available, while still keeping the sanitization behavior.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Training could fail in
HungarianMatcherwithValueError: matrix contains invalid numeric entrieswhen the cost matrix containedNaN/Infvalues. The existing cleanup path usedC.max(), which also becameNaNin those cases and left invalid entries unsanitized.Matcher cost sanitization
linear_sum_assignment.Regression coverage
NaNandInfcosts.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
images.cocodataset.org/home/REDACTED/work/rf-detr/rf-detr/.venv/bin/python /home/REDACTED/work/rf-detr/rf-detr/.venv/bin/python -u -c import sys;exec(eval(sys.stdin.readline())) e-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py(dns block)/home/REDACTED/work/rf-detr/rf-detr/.venv/bin/python /home/REDACTED/work/rf-detr/rf-detr/.venv/bin/python -u -c import sys;exec(eval(sys.stdin.readline())) -a che/pre-commit/repof3kz6y_w/.git(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
This section details on the original issue you should resolve
<issue_title>Issue in matcher - ValueError: matrix contains invalid numeric entries</issue_title>
<issue_description>Hi, I get the following error while training a model: