labels: improve regex selectivity estimation with sample values #1051

dimitarvdimitrov · 2025-12-08T18:51:18Z

Summary

Allow regex selectivity estimation to use actual sample label values instead of hardcoded 10% heuristics. This improves index lookup planning accuracy for regex matchers.

Changes

EstimateSelectivity() now accepts a sampleValues []string parameter
When sample values are provided, selectivity is computed by testing the regex against them; otherwise falls back to existing 10% heuristic
Computed selectivity is cached atomically to avoid recomputation

related to grafana/mimir#13782

Note

EstimateSelectivity now accepts sample label values to empirically compute regex selectivity (with caching); tests updated accordingly.

Labels matching/selectivity:
- Update labels.Matcher.EstimateSelectivity(totalLabelValues, sampleValues) to use sampleValues for complex regex selectivity; falls back to 0.1 when none.
- Add estimateComplexRegexSelectivity() and matchesN() in model/labels/cost.go.
- Clamp and invert logic preserved; existing static fast paths unchanged.
Regex matcher internals (model/labels/regexp.go):
- Add cached selectivity field FastRegexMatcher.estimatedSelectivity (atomic float), initialized to -1.
- Wire cache into selectivity estimation path.
Tests (model/labels/cost_test.go):
- Adjust calls to new EstimateSelectivity(..., nil).
- Add tests for sample-driven selectivity and caching behavior.

^{Written by Cursor Bugbot for commit f40a7ba. This will update automatically on new commits. Configure here.}

Add sampleValues parameter to EstimateSelectivity() to enable better regex selectivity estimation based on actual label values instead of hardcoded heuristics. - Add estimatedSelectivity field to FastRegexMatcher for caching - Add matchesN helper method to Matcher for counting matches - Update EstimateSelectivity to use sample values when available - Cache sample-based selectivity to avoid recomputation

model/labels/cost.go

cursor · 2025-12-08T18:55:24Z

model/labels/cost.go

+	// Cache the computed selectivity
+	m.re.estimatedSelectivity.Store(selectivity)
+	return selectivity
+}


Bug: Cached selectivity shared incorrectly across different labels

The estimatedSelectivity is cached on FastRegexMatcher, which is shared via global cache across all Matcher instances using the same regex pattern, regardless of label name. When two matchers like label_a=~"pattern" and label_b=~"pattern" are created, they share the same FastRegexMatcher. If EstimateSelectivity is called for label_a with its specific sample values, that selectivity is cached and incorrectly returned for label_b even though label_b may have completely different value distributions. The cache granularity is wrong - selectivity depends on label-specific sample values but is cached at the regex-pattern level.

Additional Locations (1)

model/labels/regexp.go#L77-L94

damn, you're good. and right. this is a problem

dimitarvdimitrov · 2025-12-08T19:02:19Z

in draft because cursor found a bug

dimitarvdimitrov added 6 commits December 8, 2025 19:20

move matchesN to cost.go

f769137

use atomic.Float64 for estimatedSelectivity

68ecfbd

refactor: extract estimateComplexRegexSelectivity method

7b840db

move matchesN below estimateComplexRegexSelectivity

32e305b

document selectivity caching behavior in EstimateSelectivity godoc

f40a7ba

cursor bot reviewed Dec 8, 2025

View reviewed changes

dimitarvdimitrov marked this pull request as draft December 8, 2025 19:02

dimitarvdimitrov mentioned this pull request Dec 9, 2025

index planning: estimate selectivity based on values grafana/mimir#13782

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

labels: improve regex selectivity estimation with sample values #1051

labels: improve regex selectivity estimation with sample values #1051

dimitarvdimitrov commented Dec 8, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 8, 2025

Uh oh!

dimitarvdimitrov Dec 8, 2025

Uh oh!

dimitarvdimitrov commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

labels: improve regex selectivity estimation with sample values #1051

Are you sure you want to change the base?

labels: improve regex selectivity estimation with sample values #1051

Conversation

dimitarvdimitrov commented Dec 8, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 8, 2025

Choose a reason for hiding this comment

Bug: Cached selectivity shared incorrectly across different labels

Uh oh!

dimitarvdimitrov Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

dimitarvdimitrov commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dimitarvdimitrov commented Dec 8, 2025 •

edited by cursor bot

Loading