labels: cache parsed regex in FastRegexMatcher #1045

dimitarvdimitrov · 2025-12-05T13:42:58Z

Summary

Caches the parsed regex during FastRegexMatcher initialization instead of re-parsing the regex each time SingleMatchCost() is called.

Since the tree is optimized in stringMatcherFromRegexp now the cost of some matchers has changed because we do the estimation on the optimized tree instead of on the original tree.

Note

Caches the parsed regex in FastRegexMatcher and uses it for cost estimation (with sane fallbacks), updating test expectations accordingly.

Labels / Matching:
- Cache parsed regex in FastRegexMatcher via new field parsedRe; set during initialization in newFastRegexMatcherWithoutCache() and reused in SingleMatchCost() to avoid re-parsing.
- Refine FastRegexMatcher.SingleMatchCost() to fall back to estimatedStringEqualityCost on parse failure and cap with max(estimatedStringEqualityCost, costEstimate(parsed)).
Tests:
- Update expected costs in model/labels/cost_test.go for several =~ and !~ cases to reflect estimation on the optimized parse tree.
Internals:
- Wire parsed tree from stringMatcherFromRegexp() into the matcher (m.parsedRe = parsed).

^{Written by Cursor Bugbot for commit a963ec3. This will update automatically on new commits. Configure here.}

Cache the regex cost estimation during FastRegexMatcher initialization instead of re-parsing the regex each time SingleMatchCost() is called. The cost is calculated after optimization functions simplify the parsed tree, giving more accurate estimates for the actual matching cost.

Copilot

Pull request overview

This PR optimizes regex cost estimation by caching the computed cost during FastRegexMatcher initialization, eliminating redundant regex parsing on every SingleMatchCost() call. The cost is now calculated once after the regex tree has been optimized by stringMatcherFromRegexp, resulting in more accurate cost estimates.

Key changes:

Added singleMatchCost field to FastRegexMatcher to cache the estimated cost computed during initialization
Refactored SingleMatchCost() method to check optimization paths first (setMatches, map-based matchers, prefix), then fall back to the cached cost
Improved documentation for costEstimate() function with proper function comment and formatted TODO

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
model/labels/regexp.go	Added `singleMatchCost` field to cache cost estimation; computed after regex tree optimization in `newFastRegexMatcherWithoutCache()`
model/labels/cost.go	Refactored `SingleMatchCost()` to use cached cost and removed duplicate optimization checks; improved `costEstimate()` documentation
model/labels/cost_test.go	Updated test expectations to reflect new cost estimates based on optimized regex trees

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

model/labels/cost.go

chencs

Nice find!

Dismissing until CI is fixed

bboreham · 2025-12-08T15:24:58Z

Nitpicking the title: you're caching the parsed data structure not the cost.

dimitarvdimitrov · 2025-12-08T16:21:38Z

ugh, i didn't update the title again after fixing the PR

dimitarvdimitrov requested a review from Copilot December 5, 2025 13:43

Copilot started reviewing on behalf of dimitarvdimitrov December 5, 2025 13:43 View session

dimitarvdimitrov force-pushed the dimitar/labels/cache-regex-cost branch from d2819ae to 71658d0 Compare December 5, 2025 13:44

Copilot finished reviewing on behalf of dimitarvdimitrov December 5, 2025 13:45

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Undo slop

a963ec3

dimitarvdimitrov marked this pull request as ready for review December 5, 2025 16:17

chencs reviewed Dec 5, 2025

View reviewed changes

model/labels/cost.go Show resolved Hide resolved

chencs previously approved these changes Dec 5, 2025

View reviewed changes

chencs self-requested a review December 5, 2025 23:55

dimitarvdimitrov enabled auto-merge (squash) December 5, 2025 23:58

dimitarvdimitrov changed the title ~~labels: cache regex cost in FastRegexMatcher~~ labels: cache parsed regex in FastRegexMatcher Dec 8, 2025

dimitarvdimitrov closed this Dec 8, 2025

auto-merge was automatically disabled December 8, 2025 16:22
Pull request was closed

dimitarvdimitrov reopened this Dec 8, 2025

dimitarvdimitrov merged commit 7bf2d26 into main Dec 8, 2025
154 of 160 checks passed

dimitarvdimitrov deleted the dimitar/labels/cache-regex-cost branch December 8, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

labels: cache parsed regex in FastRegexMatcher #1045

labels: cache parsed regex in FastRegexMatcher #1045

Uh oh!

dimitarvdimitrov commented Dec 5, 2025 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

chencs left a comment

Uh oh!

bboreham commented Dec 8, 2025

Uh oh!

dimitarvdimitrov commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labels: cache parsed regex in FastRegexMatcher #1045

labels: cache parsed regex in FastRegexMatcher #1045

Uh oh!

Conversation

dimitarvdimitrov commented Dec 5, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

chencs left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham commented Dec 8, 2025

Uh oh!

dimitarvdimitrov commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dimitarvdimitrov commented Dec 5, 2025 •

edited by cursor bot

Loading