Skip to content

[Improvement] Increase Unit Test Coverage to ~90% #2108

@Borda

Description

@Borda

📝 Description

As supervision matures into an industry-standard utility package for Computer Vision, the reliability and stability of the codebase become paramount. Currently, our total test coverage sits at approximately 50%, with critical modules like metrics and key_points falling well below 40%.

To ensure this package remains the go-to solution for production pipelines and research alike, I propose an initiative to systematically increase our unit test coverage to a realistic target of ~90%.

🚀 Motivation

Why is this high priority?

  • Industry Standard Responsibility: supervision is infrastructure. Users rely on it for critical CV workflows. A subtle bug in a metric calculation (like mAP or confusion matrix) can silently invalidate research experiments.
  • Refactoring Confidence: As we optimize performance, a robust test suite acts as a safety net.
  • Handling Edge Cases: Computer Vision data is notoriously messy. High coverage ensures we handle empty detections, None values, and malformed shapes gracefully.

📊 Current State

According to the latest Codecov report, we have significant gaps in core modules.

Module Coverage Status Notes
metrics 38.36% 🔴 Critical Over 1,100 lines untested.
key_points 36.68% 🔴 Critical Lowest percentage in the library.
utils 55.06% 🟡 Low
dataset 57.32% 🟡 Low Formats logic needs hardening.
detection 68.49% 🟡 Moderate High volume of missed lines (639) due to file size.

🛠 Proposed Strategy

We should tackle this modularly. The immediate priority is to lift the "Red" modules out of the danger zone.

Priority 1: The Criticals

  • metrics/: This is the highest impact area. We need to verify that mathematical calculations for accuracy, recall, and precision are correct across various edge cases.
  • key_points/: Needs basic functional tests for skeleton handling and visualization.

Priority 2: Stability Layers

  • dataset/: Ensure saving/loading functions (YOLO, COCO, Pascal VOC) are robust against file system errors and malformed inputs.
  • utils/: General utility functions that are used everywhere need to be bulletproof.

✅ Guidelines for New Tests

For contributors looking to help:

  • Test Behavior, Not Implementation: Focus on inputs and expected outputs.
  • Edge Cases: Explicitly test for empty lists, None values, and mismatched array shapes.
  • Mocking: Use mocking for external I/O (downloading files, displaying windows) to keep the suite fast.
  • Parametrization: Use @pytest.mark.parametrize to cover multiple scenarios efficiently.

📋 Tracking

Please comment below if you would like to pick up a specific module so we don't duplicate work.

  • supervision/metrics (High Priority)
  • supervision/key_points (High Priority)
  • supervision/dataset
  • supervision/utils
  • supervision/detection

The actual coverage report can be found at https://app.codecov.io/gh/roboflow/supervision/blob/develop

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestopen-to-multiple-contributorsMultiple contributors can work on this issue, but typically, only one submission will be accepted.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions