Skip to content

[common] Add PatternCache to optimize regex compilation performance#4203

Open
tchivs wants to merge 1 commit into
apache:masterfrom
tchivs:feat/pattern-cache
Open

[common] Add PatternCache to optimize regex compilation performance#4203
tchivs wants to merge 1 commit into
apache:masterfrom
tchivs:feat/pattern-cache

Conversation

@tchivs
Copy link
Copy Markdown
Contributor

@tchivs tchivs commented Dec 29, 2025

This PR introduces a PatternCache utility class to optimize regex pattern compilation performance by caching compiled Pattern instances with LRU eviction strategy.

Brief change log

  • feat(common): Add PatternCache class with LRU-based caching mechanism
  • perf(common): Integrate PatternCache into Predicates.setOfRegex() method
  • test: Add comprehensive unit tests including LRU eviction behavior test

Technical Details

PatternCache Implementation

  • Uses LinkedHashMap with access-order (true) for LRU behavior
  • Maximum cache size: 100 patterns
  • Thread-safe with synchronized methods
  • Only caches patterns without flags (regexFlags == 0)

Performance Benefits

  • Avoids repeated Pattern.compile() calls for the same regex string
  • Reduces CPU overhead in pattern-heavy operations
  • Improves throughput for table pattern matching scenarios

Integration Points

  • Predicates.setOfRegex(): Uses cache when regexFlags == 0
  • Patterns with flags are compiled directly without caching

- feat(common): introduce LRU-based PatternCache for compiled Pattern instances
- perf(common): integrate PatternCache into Predicates.setOfRegex()
- test: add comprehensive unit tests for PatternCache with LRU eviction test

Benefits:
- Avoid repeated Pattern.compile() calls for the same regex
- LRU eviction with max 100 cached patterns
- Thread-safe with synchronized access
- Significant performance improvement for pattern-heavy operations

Technical details:
- Uses LinkedHashMap with access-order for LRU behavior
- Synchronized methods ensure thread safety
- Only caches patterns without flags (regexFlags == 0)
- Patterns with flags are compiled directly without caching
Copy link
Copy Markdown
Member

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the only place Predicates got used is in the Table ID Selectors.

Could you please do some benchmark tests on this? I wonder if adding caches for the Selectors matching method (#3994) resolve this, too?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 60 days if no further activity occurs.

@github-actions github-actions Bot added the Stale label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants