Releases · coregx/coregex

Release list

v0.8.4 - Professional Anchor Fix

kolkov released this 04 Dec 18:47

v0.8.4

04952bc

Fixed

Bug #10: ^ anchor not working correctly in MatchString
- Patterns like ^abc were incorrectly matching at any position (e.g., "xabc")
- Root cause: DFA's epsilonClosure didn't handle StateLook assertions properly
- Professional fix following Rust regex-automata approach:
  - New LookSet type for tracking satisfied look assertions (dfa/lazy/look.go)
  - epsilonClosure now accepts lookHave LookSet parameter
  - Different start states for different positions (StartText, StartWord, StartLineLF, etc.)
  - Multiline ^ support: LookStartLine satisfied after \n
- Fixed prefilter bypass bug: don't use prefilter for start-anchored patterns
- Thanks to Ben Hoyt (GoAWK) for reporting!

Changed

DFA now correctly handles start-anchored patterns (no NFA fallback needed)
Strategy selection no longer forces NFA for ^ patterns

Technical Details

StateLook transitions only followed when look assertion is satisfied
LookSetFromStartKind() maps start positions to satisfied assertions
ComputeStartState() uses look-aware epsilon closure
All tests passing with race detector enabled
golangci-lint: 0 issues

Full Changelog: v0.8.3...v0.8.4

Assets 2

v0.8.3: Character class bug fixes

kolkov released this 04 Dec 09:27

v0.8.3

0bc6768

Fixed

Bug #6: Crash on negated character classes like [^,]*, [^\n]
- Large complement classes (e.g., [^\n] = 1.1M codepoints) now use efficient Sparse state representation
- Prevents memory explosion and "character class too large" errors
- Optimized range-based compilation for classes >256 runes
Bug #7: Case-insensitive character class matching [oO]+d didn't match "food"
- compileLiteral() now respects FoldCase flag from regexp/syntax parser
- ASCII letters create proper alternation between upper/lower variants
- Fixes patterns like [oO], [aA][bB], etc.

Tests

Added comprehensive test suite nfa/compile_bug_test.go (402 lines, 33 test cases)
All tests passing with race detector enabled

Maintenance

Removed 21 unused linter directives (gosec, nestif)
Code formatting cleanup
golangci-lint: 0 issues

Thanks to Ben Hoyt (GoAWK) for reporting these bugs during integration testing!

Full changelog: CHANGELOG.md

Assets 2

v0.8.2 - Critical Bug Fix

kolkov released this 03 Dec 09:47

v0.8.2

236a0b4

Fixed

Critical: Infinite loop in onepass.Build() for patterns like (.*)
- Bug: byte overflow when iterating ranges with hi=255 caused hang during compilation
- Affected patterns: (.*), ^(.*)$, ([_a-zA-Z][_a-zA-Z0-9]*)=(.*)
- Thanks to Ben Hoyt (GoAWK) for reporting!

Added

Longest() method: API compatibility with stdlib regexp.Regexp
QuoteMeta() function: Escape regex metacharacters in strings

Full Changelog: v0.8.1...v0.8.2

Assets 2

v0.8.1 - Stdlib Compatibility

kolkov released this 03 Dec 04:09

v0.8.1

a25f1a5

Added

Type alias Regexp: Drop-in compatibility with stdlib regexp package

Now you can simply replace:

import "regexp"

with:

import regexp "github.com/coregx/coregex"

Existing code using *regexp.Regexp will work without changes.

Closes #5

Assets 2

v0.8.0 - ReverseInner Strategy (3000x+ speedup)

kolkov released this 29 Nov 20:46

v0.8.0

d3d3914

ReverseInner Strategy - 3000x+ Speedup for `.keyword.` Patterns

v0.8.0 introduces ReverseInner strategy with bidirectional DFA search, delivering 3,154x speedup for IsMatch and 2,857x speedup for Find on patterns like .*connection.*, .*database.*, .*error.*.

🚀 Performance Highlights

IsMatch (inner literal patterns):

.*connection.* 250KB: 3,154x faster (12.6ms → 4µs)
.*database.* 120KB: 1,174x faster
Many candidates (100 occurrences): 25x faster

Find (inner literal patterns):

.*connection.* 250KB: 1,894x faster (15.2ms → 8µs)
.*database.* 120KB: 2,857x faster (5.7ms → 2µs)
Many candidates (100 occurrences): 13x faster

✨ What's New

ReverseInner Strategy (OPT-010, OPT-012)

AST Splitting: Separate prefix/suffix NFAs for bidirectional search
Universal Match Detection: Skip DFA scans for .* prefix/suffix patterns
Early Return Optimization: First confirmed match is leftmost by construction
Prefilter + Bidirectional DFA: Inner literal prefilter → reverse DFA confirms prefix → forward DFA confirms suffix
Zero Allocations: All optimizations maintain zero-allocation hot paths

Also Included: v0.7.0 OnePass DFA

10x faster captures (~700ns → 70ns for FindSubmatch)
Zero allocations (vs 2-4 allocs with PikeVM)
Automatically selected for simple anchored patterns

📦 Installation

go get github.com/coregx/coregex@v0.8.0

📚 Documentation

README - Updated with new benchmarks and architecture
CHANGELOG - Full v0.8.0 and v0.7.0 details

🎯 Use Cases

Perfect for:

Log parsing with patterns like .*ERROR.*connection.*
Text search with inner keywords: .*database.*, .*timeout.*
Multi-occurrence patterns in large files
Any pattern with wildcards before AND after a literal

🔧 Technical Details

New files: meta/reverse_inner.go, meta/reverse_inner_test.go
Modified: literal/extractor.go (AST splitting), meta/strategy.go, meta/meta.go
Code: +1,348 lines for ReverseInner
Tests: 7 new test suites, all passing
Coverage: 88.3% overall

Full Changelog: https://github.com/coregx/coregex/blob/v0.8.0/CHANGELOG.md

Assets 2

v0.7.1 - Go Proxy Fix

kolkov released this 29 Nov 10:02

v0.7.1

5dcca92

Minor release to update Go module proxy.

No code changes from v0.7.0 - only adds updated LICENSE and README.

For new users: Use v0.7.1 in go.mod.

Full changelog: v0.7.0...v0.7.1

Assets 2

v0.7.0 - OnePass DFA

kolkov released this 29 Nov 10:02

v0.7.0

182e119

OnePass DFA Engine (OPT-011)

10x faster capture group extraction for unambiguous patterns.

Highlights

70 ns/op for FindSubmatch (was ~700 ns with PikeVM)
Zero allocations in hot path
Auto-fallback to PikeVM for non-onepass patterns

Benchmark

Operation	PikeVM	OnePass	Speedup
FindSubmatch	~700 ns	70 ns	10x
IsMatch anchored	~50 ns	27 ns	1.8x
Memory allocations	2-3	0	Zero alloc

Full changelog: v0.6.0...v0.7.0

Assets 2

v0.6.0 - ReverseSuffix Optimization (1000x+ speedup)

kolkov released this 28 Nov 11:48

v0.6.0

a7aa99f

What's New

OPT-009: Zero-allocation reverse DFA search for suffix patterns

This release introduces the ReverseSuffix optimization strategy for patterns like .*\.txt - a massive performance improvement for suffix-matching patterns.

Performance Improvements (vs stdlib)

Benchmark	stdlib	coregex	Speedup
`.*\.txt` IsMatch 1KB	40,379 ns	307 ns	131x
`.*\.txt` IsMatch 32KB	1,323,614 ns	855 ns	1,549x
`.*\.txt` IsMatch 1MB	27,651,042 ns	21,041 ns	1,314x

Key Features

Zero heap allocations - Backward scan without byte reversal
Smart strategy selection - Prefers prefix literals when available
Greedy matching - Leftmost-longest semantics
DFA-only execution - No PikeVM fallback needed

Files Changed

New: meta/reverse_suffix.go - ReverseSuffix strategy implementation
New: dfa/lazy/lazy.go - SearchReverse() / IsMatchReverse() methods
Updated: meta/strategy.go - Strategy selection logic

Full Changelog

See CHANGELOG.md for complete details.

Assets 2

v0.5.0: Named Capture Groups

kolkov released this 28 Nov 08:26

v0.5.0

5658a24

What's New in v0.5.0

Named Capture Groups ✨

Full support for (?P<name>...) syntax with stdlib-compatible API:

SubexpNames() API - Returns slice of capture group names
- Available in Regex, Engine, and NFA
- Index 0 is always "" (entire match)
- Compatible with stdlib regexp.Regexp.SubexpNames() behavior

Implementation Details

NFA Compiler Enhancement: collectCaptureInfo() collects capture names during compilation
- Two-pass algorithm: count captures, then collect names
- Stores names from syntax.Regexp.Name field
Builder Enhancement: WithCaptureNames() BuildOption for passing names to NFA

Testing & Quality

18 new unit tests covering all named capture scenarios
2 integration examples demonstrating real-world usage
88.3% test coverage
All CI checks passing

Files Changed

New files:

nfa/named_captures_test.go - Comprehensive test suite
example_subexpnames_test.go - Integration examples

Modified files:

nfa/nfa.go - Added captureNames field and SubexpNames() method
nfa/compile.go - Implemented collectCaptureInfo() two-pass algorithm
nfa/builder.go - Added WithCaptureNames() BuildOption
meta/meta.go - Exposed SubexpNames() through Engine
regex.go - Public API for SubexpNames()
README.md - Updated feature table and documentation
CHANGELOG.md - Added v0.5.0 release notes

Code statistics:

+333 lines, -20 lines (9 files changed)
+200 lines for named captures implementation

Full Changelog: v0.4.0...v0.5.0

Assets 2

v0.4.0: ReverseAnchored Strategy + Core Optimizations

kolkov released this 28 Nov 07:17

v0.4.0

0e330d6

Highlights

205,000x speedup for end-anchored patterns ($ anchor) through reverse search strategy.

What's New

Reverse Search Engine

nfa.Reverse() - Build reverse NFA from forward NFA
nfa.ReverseAnchored() - Build anchored reverse NFA for $ patterns
ReverseAnchoredSearcher - Optimized search for end-anchored patterns
Automatic strategy selection via IsPatternEndAnchored()

Core Optimizations (OPT-001..006)

Start State Caching - 6 start configurations with StartByteMap
Prefilter Effectiveness Tracking - Dynamic disabling at >90% false positives
Early Match Termination - searchEarliestMatch() for IsMatch
State Acceleration - memchr/memchr2/memchr3 in DFA loop
ByteClasses - Alphabet compression for reduced DFA states
Specialized Search Functions - Optimized Count/FindAllSubmatch

Bug Fixes

FIX-001: PikeVM visited check (prevents exponential thread explosion)
FIX-002: ReverseAnchored unanchored prefix bug (critical fix)

Performance

Pattern	Before	After	Speedup
Easy1 $ anchor 1MB	340 sec	1.6 ms	205,000x
Case-insensitive 32KB	-	-	233x vs stdlib
Hard1 multi-alternation	-	-	5.2x vs stdlib

Installation

go get github.com/coregx/coregex@v0.4.0

Full Changelog

See CHANGELOG.md for details.

Assets 2

Uh oh!

Uh oh!

Releases: coregx/coregex

Release list

v0.8.4 - Professional Anchor Fix

Fixed

Changed

Technical Details

Uh oh!

v0.8.3: Character class bug fixes

Fixed

Tests

Maintenance

Uh oh!

v0.8.2 - Critical Bug Fix

Fixed

Added

Uh oh!

v0.8.1 - Stdlib Compatibility

Added

Uh oh!

v0.8.0 - ReverseInner Strategy (3000x+ speedup)

ReverseInner Strategy - 3000x+ Speedup for .*keyword.* Patterns

🚀 Performance Highlights

✨ What's New

ReverseInner Strategy (OPT-010, OPT-012)

Also Included: v0.7.0 OnePass DFA

📦 Installation

📚 Documentation

🎯 Use Cases

🔧 Technical Details

Uh oh!

v0.7.1 - Go Proxy Fix

Uh oh!

v0.7.0 - OnePass DFA

OnePass DFA Engine (OPT-011)

Highlights

Benchmark

Uh oh!

v0.6.0 - ReverseSuffix Optimization (1000x+ speedup)

What's New

Performance Improvements (vs stdlib)

Key Features

Files Changed

Full Changelog

Uh oh!

v0.5.0: Named Capture Groups

What's New in v0.5.0

Named Capture Groups ✨

Implementation Details

Testing & Quality

Files Changed

Uh oh!

v0.4.0: ReverseAnchored Strategy + Core Optimizations

Highlights

What's New

Reverse Search Engine

Core Optimizations (OPT-001..006)

Bug Fixes

Performance

Installation

Full Changelog

Uh oh!

ReverseInner Strategy - 3000x+ Speedup for `.keyword.` Patterns