Skip to content

Conversation

@dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Dec 5, 2025

Summary

Caches the parsed regex during FastRegexMatcher initialization instead of re-parsing the regex each time SingleMatchCost() is called.

Since the tree is optimized in stringMatcherFromRegexp now the cost of some matchers has changed because we do the estimation on the optimized tree instead of on the original tree.


Note

Caches the parsed regex in FastRegexMatcher and uses it for cost estimation (with sane fallbacks), updating test expectations accordingly.

  • Labels / Matching:
    • Cache parsed regex in FastRegexMatcher via new field parsedRe; set during initialization in newFastRegexMatcherWithoutCache() and reused in SingleMatchCost() to avoid re-parsing.
    • Refine FastRegexMatcher.SingleMatchCost() to fall back to estimatedStringEqualityCost on parse failure and cap with max(estimatedStringEqualityCost, costEstimate(parsed)).
  • Tests:
    • Update expected costs in model/labels/cost_test.go for several =~ and !~ cases to reflect estimation on the optimized parse tree.
  • Internals:
    • Wire parsed tree from stringMatcherFromRegexp() into the matcher (m.parsedRe = parsed).

Written by Cursor Bugbot for commit a963ec3. This will update automatically on new commits. Configure here.

Cache the regex cost estimation during FastRegexMatcher initialization
instead of re-parsing the regex each time SingleMatchCost() is called.

The cost is calculated after optimization functions simplify the parsed
tree, giving more accurate estimates for the actual matching cost.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes regex cost estimation by caching the computed cost during FastRegexMatcher initialization, eliminating redundant regex parsing on every SingleMatchCost() call. The cost is now calculated once after the regex tree has been optimized by stringMatcherFromRegexp, resulting in more accurate cost estimates.

Key changes:

  • Added singleMatchCost field to FastRegexMatcher to cache the estimated cost computed during initialization
  • Refactored SingleMatchCost() method to check optimization paths first (setMatches, map-based matchers, prefix), then fall back to the cached cost
  • Improved documentation for costEstimate() function with proper function comment and formatted TODO

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
model/labels/regexp.go Added singleMatchCost field to cache cost estimation; computed after regex tree optimization in newFastRegexMatcherWithoutCache()
model/labels/cost.go Refactored SingleMatchCost() to use cached cost and removed duplicate optimization checks; improved costEstimate() documentation
model/labels/cost_test.go Updated test expectations to reflect new cost estimates based on optimized regex trees

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review December 5, 2025 16:17
chencs
chencs previously approved these changes Dec 5, 2025
Copy link
Contributor

@chencs chencs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

@chencs chencs self-requested a review December 5, 2025 23:55
@chencs chencs dismissed their stale review December 5, 2025 23:56

Dismissing until CI is fixed

@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) December 5, 2025 23:58
@bboreham
Copy link
Contributor

bboreham commented Dec 8, 2025

Nitpicking the title: you're caching the parsed data structure not the cost.

@dimitarvdimitrov
Copy link
Contributor Author

ugh, i didn't update the title again after fixing the PR

@dimitarvdimitrov dimitarvdimitrov changed the title labels: cache regex cost in FastRegexMatcher labels: cache parsed regex in FastRegexMatcher Dec 8, 2025
auto-merge was automatically disabled December 8, 2025 16:22

Pull request was closed

@dimitarvdimitrov dimitarvdimitrov merged commit 7bf2d26 into main Dec 8, 2025
154 of 160 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/labels/cache-regex-cost branch December 8, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants