Releases: coregx/coregex
Release list
v0.8.4 - Professional Anchor Fix
Fixed
- Bug #10:
^anchor not working correctly in MatchString- Patterns like
^abcwere incorrectly matching at any position (e.g., "xabc") - Root cause: DFA's
epsilonClosuredidn't handleStateLookassertions properly - Professional fix following Rust regex-automata approach:
- New
LookSettype for tracking satisfied look assertions (dfa/lazy/look.go) epsilonClosurenow acceptslookHave LookSetparameter- Different start states for different positions (StartText, StartWord, StartLineLF, etc.)
- Multiline
^support:LookStartLinesatisfied after\n
- New
- Fixed prefilter bypass bug: don't use prefilter for start-anchored patterns
- Thanks to Ben Hoyt (GoAWK) for reporting!
- Patterns like
Changed
- DFA now correctly handles start-anchored patterns (no NFA fallback needed)
- Strategy selection no longer forces NFA for
^patterns
Technical Details
StateLooktransitions only followed when look assertion is satisfiedLookSetFromStartKind()maps start positions to satisfied assertionsComputeStartState()uses look-aware epsilon closure- All tests passing with race detector enabled
- golangci-lint: 0 issues
Full Changelog: v0.8.3...v0.8.4
v0.8.3: Character class bug fixes
Fixed
-
Bug #6: Crash on negated character classes like
[^,]*,[^\n]- Large complement classes (e.g.,
[^\n]= 1.1M codepoints) now use efficient Sparse state representation - Prevents memory explosion and "character class too large" errors
- Optimized range-based compilation for classes >256 runes
- Large complement classes (e.g.,
-
Bug #7: Case-insensitive character class matching
[oO]+ddidn't match "food"compileLiteral()now respectsFoldCaseflag fromregexp/syntaxparser- ASCII letters create proper alternation between upper/lower variants
- Fixes patterns like
[oO],[aA][bB], etc.
Tests
- Added comprehensive test suite
nfa/compile_bug_test.go(402 lines, 33 test cases) - All tests passing with race detector enabled
Maintenance
- Removed 21 unused linter directives (gosec, nestif)
- Code formatting cleanup
- golangci-lint: 0 issues
Thanks to Ben Hoyt (GoAWK) for reporting these bugs during integration testing!
Full changelog: CHANGELOG.md
v0.8.2 - Critical Bug Fix
Fixed
- Critical: Infinite loop in
onepass.Build()for patterns like(.*)- Bug: byte overflow when iterating ranges with hi=255 caused hang during compilation
- Affected patterns:
(.*),^(.*)$,([_a-zA-Z][_a-zA-Z0-9]*)=(.*) - Thanks to Ben Hoyt (GoAWK) for reporting!
Added
Longest()method: API compatibility with stdlibregexp.RegexpQuoteMeta()function: Escape regex metacharacters in strings
Full Changelog: v0.8.1...v0.8.2
v0.8.1 - Stdlib Compatibility
Added
- Type alias
Regexp: Drop-in compatibility with stdlibregexppackage
Now you can simply replace:
import "regexp"with:
import regexp "github.com/coregx/coregex"Existing code using *regexp.Regexp will work without changes.
Closes #5
v0.8.0 - ReverseInner Strategy (3000x+ speedup)
ReverseInner Strategy - 3000x+ Speedup for .*keyword.* Patterns
v0.8.0 introduces ReverseInner strategy with bidirectional DFA search, delivering 3,154x speedup for IsMatch and 2,857x speedup for Find on patterns like .*connection.*, .*database.*, .*error.*.
🚀 Performance Highlights
IsMatch (inner literal patterns):
.*connection.*250KB: 3,154x faster (12.6ms → 4µs).*database.*120KB: 1,174x faster- Many candidates (100 occurrences): 25x faster
Find (inner literal patterns):
.*connection.*250KB: 1,894x faster (15.2ms → 8µs).*database.*120KB: 2,857x faster (5.7ms → 2µs)- Many candidates (100 occurrences): 13x faster
✨ What's New
ReverseInner Strategy (OPT-010, OPT-012)
- AST Splitting: Separate prefix/suffix NFAs for bidirectional search
- Universal Match Detection: Skip DFA scans for
.*prefix/suffix patterns - Early Return Optimization: First confirmed match is leftmost by construction
- Prefilter + Bidirectional DFA: Inner literal prefilter → reverse DFA confirms prefix → forward DFA confirms suffix
- Zero Allocations: All optimizations maintain zero-allocation hot paths
Also Included: v0.7.0 OnePass DFA
- 10x faster captures (~700ns → 70ns for
FindSubmatch) - Zero allocations (vs 2-4 allocs with PikeVM)
- Automatically selected for simple anchored patterns
📦 Installation
go get github.com/coregx/coregex@v0.8.0📚 Documentation
🎯 Use Cases
Perfect for:
- Log parsing with patterns like
.*ERROR.*connection.* - Text search with inner keywords:
.*database.*,.*timeout.* - Multi-occurrence patterns in large files
- Any pattern with wildcards before AND after a literal
🔧 Technical Details
- New files:
meta/reverse_inner.go,meta/reverse_inner_test.go - Modified:
literal/extractor.go(AST splitting),meta/strategy.go,meta/meta.go - Code: +1,348 lines for ReverseInner
- Tests: 7 new test suites, all passing
- Coverage: 88.3% overall
Full Changelog: https://github.com/coregx/coregex/blob/v0.8.0/CHANGELOG.md
v0.7.1 - Go Proxy Fix
Minor release to update Go module proxy.
No code changes from v0.7.0 - only adds updated LICENSE and README.
For new users: Use v0.7.1 in go.mod.
Full changelog: v0.7.0...v0.7.1
v0.7.0 - OnePass DFA
OnePass DFA Engine (OPT-011)
10x faster capture group extraction for unambiguous patterns.
Highlights
- 70 ns/op for FindSubmatch (was ~700 ns with PikeVM)
- Zero allocations in hot path
- Auto-fallback to PikeVM for non-onepass patterns
Benchmark
| Operation | PikeVM | OnePass | Speedup |
|---|---|---|---|
| FindSubmatch | ~700 ns | 70 ns | 10x |
| IsMatch anchored | ~50 ns | 27 ns | 1.8x |
| Memory allocations | 2-3 | 0 | Zero alloc |
Full changelog: v0.6.0...v0.7.0
v0.6.0 - ReverseSuffix Optimization (1000x+ speedup)
What's New
OPT-009: Zero-allocation reverse DFA search for suffix patterns
This release introduces the ReverseSuffix optimization strategy for patterns like .*\.txt - a massive performance improvement for suffix-matching patterns.
Performance Improvements (vs stdlib)
| Benchmark | stdlib | coregex | Speedup |
|---|---|---|---|
.*\.txt IsMatch 1KB |
40,379 ns | 307 ns | 131x |
.*\.txt IsMatch 32KB |
1,323,614 ns | 855 ns | 1,549x |
.*\.txt IsMatch 1MB |
27,651,042 ns | 21,041 ns | 1,314x |
Key Features
- Zero heap allocations - Backward scan without byte reversal
- Smart strategy selection - Prefers prefix literals when available
- Greedy matching - Leftmost-longest semantics
- DFA-only execution - No PikeVM fallback needed
Files Changed
- New:
meta/reverse_suffix.go- ReverseSuffix strategy implementation - New:
dfa/lazy/lazy.go-SearchReverse()/IsMatchReverse()methods - Updated:
meta/strategy.go- Strategy selection logic
Full Changelog
See CHANGELOG.md for complete details.
v0.5.0: Named Capture Groups
What's New in v0.5.0
Named Capture Groups ✨
Full support for (?P<name>...) syntax with stdlib-compatible API:
SubexpNames()API - Returns slice of capture group names- Available in
Regex,Engine, andNFA - Index 0 is always "" (entire match)
- Compatible with stdlib
regexp.Regexp.SubexpNames()behavior
- Available in
Implementation Details
- NFA Compiler Enhancement:
collectCaptureInfo()collects capture names during compilation- Two-pass algorithm: count captures, then collect names
- Stores names from
syntax.Regexp.Namefield
- Builder Enhancement:
WithCaptureNames()BuildOption for passing names to NFA
Testing & Quality
- 18 new unit tests covering all named capture scenarios
- 2 integration examples demonstrating real-world usage
- 88.3% test coverage
- All CI checks passing
Files Changed
New files:
nfa/named_captures_test.go- Comprehensive test suiteexample_subexpnames_test.go- Integration examples
Modified files:
nfa/nfa.go- AddedcaptureNamesfield andSubexpNames()methodnfa/compile.go- ImplementedcollectCaptureInfo()two-pass algorithmnfa/builder.go- AddedWithCaptureNames()BuildOptionmeta/meta.go- ExposedSubexpNames()through Engineregex.go- Public API forSubexpNames()README.md- Updated feature table and documentationCHANGELOG.md- Added v0.5.0 release notes
Code statistics:
- +333 lines, -20 lines (9 files changed)
- +200 lines for named captures implementation
Full Changelog: v0.4.0...v0.5.0
v0.4.0: ReverseAnchored Strategy + Core Optimizations
Highlights
205,000x speedup for end-anchored patterns ($ anchor) through reverse search strategy.
What's New
Reverse Search Engine
nfa.Reverse()- Build reverse NFA from forward NFAnfa.ReverseAnchored()- Build anchored reverse NFA for$patternsReverseAnchoredSearcher- Optimized search for end-anchored patterns- Automatic strategy selection via
IsPatternEndAnchored()
Core Optimizations (OPT-001..006)
- Start State Caching - 6 start configurations with StartByteMap
- Prefilter Effectiveness Tracking - Dynamic disabling at >90% false positives
- Early Match Termination -
searchEarliestMatch()for IsMatch - State Acceleration - memchr/memchr2/memchr3 in DFA loop
- ByteClasses - Alphabet compression for reduced DFA states
- Specialized Search Functions - Optimized Count/FindAllSubmatch
Bug Fixes
- FIX-001: PikeVM visited check (prevents exponential thread explosion)
- FIX-002: ReverseAnchored unanchored prefix bug (critical fix)
Performance
| Pattern | Before | After | Speedup |
|---|---|---|---|
| Easy1 $ anchor 1MB | 340 sec | 1.6 ms | 205,000x |
| Case-insensitive 32KB | - | - | 233x vs stdlib |
| Hard1 multi-alternation | - | - | 5.2x vs stdlib |
Installation
go get github.com/coregx/coregex@v0.4.0Full Changelog
See CHANGELOG.md for details.