Fix SUMIF/COUNTIF exact text matching to not match substrings#2267
Conversation
Modified formulaCriteriaParser to wrap regex patterns with ^ and $. Prevents partial string matches in SUMIF/COUNTIF functions when wildcards are not explicitly used. Test verifies "administrative" matches exactly 2 cells, not 4 substring matches like "cyclical_flat_administrative".
Changed logic to only use regex matching when wildcards are present. Plain strings now use direct equality comparison, improving performance for common use cases without pattern matching.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2267 +/- ##
========================================
Coverage 99.55% 99.56%
========================================
Files 32 32
Lines 25778 26316 +538
========================================
+ Hits 25663 26201 +538
Misses 60 60
Partials 55 55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes SUMIF/COUNTIF criteria parsing so that plain-text criteria match whole cell values (not substrings), aligning behavior with Excel’s exact-match semantics when no wildcards are present.
Changes:
- Updated
formulaCriteriaParserto treat non-wildcard criteria as strict equality and wildcard criteria as anchored regex (^...$). - Added a regression test covering exact-match vs wildcard-match behavior for
SUMIFandCOUNTIF.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| calc.go | Adjusts criteria parsing logic to prevent substring matches for non-wildcard text criteria and anchors wildcard patterns. |
| calc_test.go | Adds coverage ensuring exact matches don’t match substrings and wildcard patterns behave as intended. |
Comments suppressed due to low confidence (1)
calc.go:1868
- In the wildcard branch, the criteria is converted into a Go regexp by only replacing
?/*and then wrapping with^...$. Any other regexp metacharacters in the original criteria (e.g..,+,[,\,^,$) are currently interpreted as regexp syntax (or can even make the pattern invalid), which diverges from Excel semantics where only*and?are special. Consider escaping the input first (e.g.regexp.QuoteMeta(val)or reusingmatchPatternToRegExp-style escaping), then translating the escaped wildcards (and ideally supporting Excel's~escape for literal*/?).
hasWildcard := strings.Contains(val, "?") || strings.Contains(val, "*")
if strings.Contains(val, "?") {
val = strings.ReplaceAll(val, "?", ".")
}
if strings.Contains(val, "*") {
val = strings.ReplaceAll(val, "*", ".*")
}
if hasWildcard {
fc.Type, fc.Condition = criteriaRegexp, newStringFormulaArg("^"+val+"$")
} else {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
xuri
left a comment
There was a problem hiding this comment.
Thanks for your PR. I made some changes based on your branch, add more test cases and simplify code.
PR Details
Fix
formulaCriteriaParserto anchor regex patterns so that exact text criteria in SUMIF/COUNTIF do not incorrectly match substrings.Description
Modified
formulaCriteriaParserto wrap regex patterns with^and$anchors. Previously, criteria like"administrative"would match substrings such as"cyclical_flat_administrative", producing incorrect results. With this fix, exact text criteria only match full cell values, while wildcard patterns (*,?) continue to work as expected.Related Issue
N/A
Motivation and Context
When using
SUMIF(A2:A7,"administrative",B2:B7), the formula was matching cells containing "cyclical_flat_administrative" and "pre_administrative_post" in addition to "administrative". This is incorrect behavior — Excel only matches exact values when no wildcards are present. The fix ensures parity with Excel's behavior.How Has This Been Tested
Added
TestCalcSUMIFExactMatchwhich verifies:"administrative") only matches full cell values (sum = 250, count = 2)*administrative*matches all cells containing the substring (sum = 750, count = 4)administrative*matches cells starting with the term*administrativematches cells ending with the termothe?matches as expectedTypes of changes
Checklist