Skip to content

Comments

[375] Adding margin of error for field coverage thresholds#464

Open
NiltonGMJunior wants to merge 7 commits intoscrapinghub:masterfrom
NiltonGMJunior:375-margin-error-coverage-thresholds
Open

[375] Adding margin of error for field coverage thresholds#464
NiltonGMJunior wants to merge 7 commits intoscrapinghub:masterfrom
NiltonGMJunior:375-margin-error-coverage-thresholds

Conversation

@NiltonGMJunior
Copy link
Contributor

No description provided.

Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
…ttings documentation

Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
@kennyaires kennyaires requested review from Gallaecio and wRAR October 2, 2025 11:54
@kennyaires
Copy link
Contributor

@Gallaecio @wRAR could you please review? This PR was done by the Delivery monitoring chapter. Thanks!

@Gallaecio
Copy link
Member

Closing and reopening to re-trigger CI.

@VMRuiz Maybe you are a better choice to review this change?

@Gallaecio Gallaecio closed this Oct 6, 2025
@Gallaecio Gallaecio reopened this Oct 6, 2025
@Gallaecio
Copy link
Member

Gallaecio commented Oct 6, 2025

The code looks fine to me. The feature, not so sure.

I think we need to justify the reason for this feature to exist, and cover that in the docs. Why would you not just adjust the coverage expectations instead of define a tolerance? I imagine there are scenarios where this is preferred, but I think we need to give examples, and maybe even discourage the use of this setting when updating the coverage expectations makes more sense. Especially since this feature is global, while coverage can be specified per field. And when you have different per-field coverage values, 5% here makes it so that a 95% becomes a 90% but also a 10% becomes a 5% or a 5% becomes a 0%, which does not sound like a good idea to me (but no strong opinion).

@kennyaires
Copy link
Contributor

Thanks for the insights, @Gallaecio! @NiltonGMJunior, I agree with Adrian—it'd be great to have examples of scenarios for the default (non-tolerance threshold) case and a special case (highlighting caution when adding tolerance, generally discouraging its use unless it's for large-volume projects where we only want to catch significant deviations, like <5%).

@VMRuiz
Copy link
Collaborator

VMRuiz commented Oct 23, 2025

This topic was already discussed in the past — see #400.

I agree with Adrian: it doesn’t make sense to configure coverage thresholds field by field if they can all be overridden or even disabled by a higher-level setting. If you want some margin in your project, it’s better to incorporate that directly into the threshold values.

As mentioned in the linked PR, I understand the desire to avoid triggering alarms for very small decimal differences — for example, considering 49.999% close enough to 50% but not allowing 49% or even 49.9%. In that case, having to specify something like 0.998% for every field would indeed be annoying. However, rounding values isn’t the right solution here. A better approach would be to introduce a small tolerance at the decimal scale (not as large as 100%, as proposed here) and use a is_close type of check.

Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
Copy link
Collaborator

@VMRuiz VMRuiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Adrián Chaves <adrian@chaves.gal>
Comment on lines +443 to +462
You can also configure a tolerance setting to handle small decimal precision differences
in field coverage calculations. This is useful to avoid false alarms when coverage values
are very close to the expected threshold due to floating-point precision issues (e.g.,
49.999% vs 50.0%).

Use the ``SPIDERMON_FIELD_COVERAGE_TOLERANCE`` setting to define the absolute tolerance
as a small float value (default: 0, no tolerance). The tolerance is used with Python's
``math.isclose()`` function to determine if the actual coverage is "close enough" to
the expected coverage.

.. note::
This setting is intended for handling decimal precision issues, not for creating
margins or error bars. If you want to allow larger variations in coverage, consider
adjusting the coverage thresholds directly in ``SPIDERMON_FIELD_COVERAGE_RULES``
instead of using this tolerance setting.

.. code-block:: python

SPIDERMON_FIELD_COVERAGE_TOLERANCE = 0.001 # Allow 0.1% difference for precision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplication seems a bit wasteful. Maybe it would be better to instead link to the setting.

Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
@wRAR wRAR closed this Feb 5, 2026
@wRAR wRAR reopened this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants