Skip to content

feat: handle parser artifacts in single-ingredient checks (#8353)#13367

Open
adityav27 wants to merge 2 commits into
openfoodfacts:mainfrom
adityav27:fix/nutriscore-category-check
Open

feat: handle parser artifacts in single-ingredient checks (#8353)#13367
adityav27 wants to merge 2 commits into
openfoodfacts:mainfrom
adityav27:fix/nutriscore-category-check

Conversation

@adityav27
Copy link
Copy Markdown

@adityav27 adityav27 commented Mar 28, 2026

What

This PR resolves the 1000% spike in false positives for the ingredients-single-ingredient-from-category-does-not-match-actual-ingredients data quality check.

Why it was made:
Previously, if the NLP parser extracted a correct ingredient but failed to parse additional descriptive marketing text (e.g., parsing both "en:olive-oil" and "en:unknown-manually-harvested"), DataQualityFood.pm would blindly throw a strict error simply because the array length was > 1. It failed perfectly valid products without checking if the correct ingredient was actually present.

What was changed:
I updated the validation block to scan the parsed ingredients array in $O(M)$ time:

  1. Maintained Strict Errors: If the expected taxonomy ingredient is completely missing from the array (e.g., finding "sugar" when expecting "olive oil"), it pushes to data_quality_errors_tags.
  2. Downgraded to Warning: If the expected ingredient is found, but the count is > 1 (due to unparsed marketing artifacts), it now pushes to data_quality_warnings_tags. This safely flags the text for human review without invalidating the product.

Large Language Models usage disclosure

I used GitHub Copilot Chat to analyze the repository structure and simulate edge cases based on the existing dataqualityfood.t test suite.

…cts#8353)

- Upgraded ingredient validation to scan for expected ingredients in O(M) time.
- Categorized unparsed descriptive text as a warning rather than a hard error.
- Maintained zero-false-positive strictness for actual ingredient mismatches.
@adityav27
Copy link
Copy Markdown
Author

@teolemon could you please take a look when you have time?

@sonarqubecloud
Copy link
Copy Markdown

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 7, 2026

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.51%. Comparing base (7e62247) to head (96815d9).
⚠️ Report is 94 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/ProductOpener/DataQualityFood.pm 55.55% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13367      +/-   ##
==========================================
+ Coverage   49.25%   54.51%   +5.25%     
==========================================
  Files          97       98       +1     
  Lines       25552    25618      +66     
  Branches     6101     6113      +12     
==========================================
+ Hits        12585    13965    +1380     
+ Misses      11338     9852    -1486     
- Partials     1629     1801     +172     
Flag Coverage Δ
integration-test-group-1 8.93% <0.00%> (?)
integration-test-group-2 9.11% <0.00%> (?)
integration-test-group-3 32.81% <0.00%> (?)
integration-test-group-4 28.51% <0.00%> (?)
integration-test-group-5 9.02% <0.00%> (?)
integration-test-group-6 9.47% <0.00%> (?)
integration-test-group-7 27.67% <0.00%> (?)
integration-test-group-8 26.35% <0.00%> (?)
integration-test-group-9 12.91% <0.00%> (?)
unit-test-group-1 24.62% <0.00%> (-0.01%) ⬇️
unit-test-group-2 34.32% <55.55%> (+<0.01%) ⬆️
unit-test-group-3 23.32% <0.00%> (-0.01%) ⬇️
unit-test-group-4 30.18% <0.00%> (-0.02%) ⬇️
unit-test-group-5 15.76% <0.00%> (-0.01%) ⬇️
unit-test-group-6 27.02% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🧽 Data quality https://wiki.openfoodfacts.org/Quality 🚦 Nutri-Score

Projects

Status: To discuss and validate
Status: Todo
Status: In progress

Development

Successfully merging this pull request may close these issues.

Data quality error: category and computed Nutri-Score is not coherent

3 participants