-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
📝 Description
As supervision matures into an industry-standard utility package for Computer Vision, the reliability and stability of the codebase become paramount. Currently, our total test coverage sits at approximately 50%, with critical modules like metrics and key_points falling well below 40%.
To ensure this package remains the go-to solution for production pipelines and research alike, I propose an initiative to systematically increase our unit test coverage to a realistic target of ~90%.
🚀 Motivation
Why is this high priority?
- Industry Standard Responsibility:
supervisionis infrastructure. Users rely on it for critical CV workflows. A subtle bug in a metric calculation (like mAP or confusion matrix) can silently invalidate research experiments. - Refactoring Confidence: As we optimize performance, a robust test suite acts as a safety net.
- Handling Edge Cases: Computer Vision data is notoriously messy. High coverage ensures we handle empty detections,
Nonevalues, and malformed shapes gracefully.
📊 Current State
According to the latest Codecov report, we have significant gaps in core modules.
| Module | Coverage | Status | Notes |
|---|---|---|---|
metrics |
38.36% | 🔴 Critical | Over 1,100 lines untested. |
key_points |
36.68% | 🔴 Critical | Lowest percentage in the library. |
utils |
55.06% | 🟡 Low | |
dataset |
57.32% | 🟡 Low | Formats logic needs hardening. |
detection |
68.49% | 🟡 Moderate | High volume of missed lines (639) due to file size. |
🛠 Proposed Strategy
We should tackle this modularly. The immediate priority is to lift the "Red" modules out of the danger zone.
Priority 1: The Criticals
metrics/: This is the highest impact area. We need to verify that mathematical calculations for accuracy, recall, and precision are correct across various edge cases.key_points/: Needs basic functional tests for skeleton handling and visualization.
Priority 2: Stability Layers
dataset/: Ensure saving/loading functions (YOLO, COCO, Pascal VOC) are robust against file system errors and malformed inputs.utils/: General utility functions that are used everywhere need to be bulletproof.
✅ Guidelines for New Tests
For contributors looking to help:
- Test Behavior, Not Implementation: Focus on inputs and expected outputs.
- Edge Cases: Explicitly test for empty lists,
Nonevalues, and mismatched array shapes. - Mocking: Use mocking for external I/O (downloading files, displaying windows) to keep the suite fast.
- Parametrization: Use
@pytest.mark.parametrizeto cover multiple scenarios efficiently.
📋 Tracking
Please comment below if you would like to pick up a specific module so we don't duplicate work.
-
supervision/metrics(High Priority) -
supervision/key_points(High Priority) -
supervision/dataset -
supervision/utils -
supervision/detection
The actual coverage report can be found at https://app.codecov.io/gh/roboflow/supervision/blob/develop