-
Notifications
You must be signed in to change notification settings - Fork 10
labels: cache parsed regex in FastRegexMatcher #1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cache the regex cost estimation during FastRegexMatcher initialization instead of re-parsing the regex each time SingleMatchCost() is called. The cost is calculated after optimization functions simplify the parsed tree, giving more accurate estimates for the actual matching cost.
d2819ae to
71658d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes regex cost estimation by caching the computed cost during FastRegexMatcher initialization, eliminating redundant regex parsing on every SingleMatchCost() call. The cost is now calculated once after the regex tree has been optimized by stringMatcherFromRegexp, resulting in more accurate cost estimates.
Key changes:
- Added
singleMatchCostfield toFastRegexMatcherto cache the estimated cost computed during initialization - Refactored
SingleMatchCost()method to check optimization paths first (setMatches, map-based matchers, prefix), then fall back to the cached cost - Improved documentation for
costEstimate()function with proper function comment and formatted TODO
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| model/labels/regexp.go | Added singleMatchCost field to cache cost estimation; computed after regex tree optimization in newFastRegexMatcherWithoutCache() |
| model/labels/cost.go | Refactored SingleMatchCost() to use cached cost and removed duplicate optimization checks; improved costEstimate() documentation |
| model/labels/cost_test.go | Updated test expectations to reflect new cost estimates based on optimized regex trees |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
chencs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
|
Nitpicking the title: you're caching the parsed data structure not the cost. |
|
ugh, i didn't update the title again after fixing the PR |
Pull request was closed
Summary
Caches the parsed regex during
FastRegexMatcherinitialization instead of re-parsing the regex each timeSingleMatchCost()is called.Since the tree is optimized in
stringMatcherFromRegexpnow the cost of some matchers has changed because we do the estimation on the optimized tree instead of on the original tree.Note
Caches the parsed regex in
FastRegexMatcherand uses it for cost estimation (with sane fallbacks), updating test expectations accordingly.FastRegexMatchervia new fieldparsedRe; set during initialization innewFastRegexMatcherWithoutCache()and reused inSingleMatchCost()to avoid re-parsing.FastRegexMatcher.SingleMatchCost()to fall back toestimatedStringEqualityCoston parse failure and cap withmax(estimatedStringEqualityCost, costEstimate(parsed)).model/labels/cost_test.gofor several=~and!~cases to reflect estimation on the optimized parse tree.stringMatcherFromRegexp()into the matcher (m.parsedRe = parsed).Written by Cursor Bugbot for commit a963ec3. This will update automatically on new commits. Configure here.