Skip to content

Releases: coregx/coregex

v0.8.4 - Professional Anchor Fix

Choose a tag to compare

@kolkov kolkov released this 04 Dec 18:47

Fixed

  • Bug #10: ^ anchor not working correctly in MatchString
    • Patterns like ^abc were incorrectly matching at any position (e.g., "xabc")
    • Root cause: DFA's epsilonClosure didn't handle StateLook assertions properly
    • Professional fix following Rust regex-automata approach:
      • New LookSet type for tracking satisfied look assertions (dfa/lazy/look.go)
      • epsilonClosure now accepts lookHave LookSet parameter
      • Different start states for different positions (StartText, StartWord, StartLineLF, etc.)
      • Multiline ^ support: LookStartLine satisfied after \n
    • Fixed prefilter bypass bug: don't use prefilter for start-anchored patterns
    • Thanks to Ben Hoyt (GoAWK) for reporting!

Changed

  • DFA now correctly handles start-anchored patterns (no NFA fallback needed)
  • Strategy selection no longer forces NFA for ^ patterns

Technical Details

  • StateLook transitions only followed when look assertion is satisfied
  • LookSetFromStartKind() maps start positions to satisfied assertions
  • ComputeStartState() uses look-aware epsilon closure
  • All tests passing with race detector enabled
  • golangci-lint: 0 issues

Full Changelog: v0.8.3...v0.8.4

v0.8.3: Character class bug fixes

Choose a tag to compare

@kolkov kolkov released this 04 Dec 09:27

Fixed

  • Bug #6: Crash on negated character classes like [^,]*, [^\n]

    • Large complement classes (e.g., [^\n] = 1.1M codepoints) now use efficient Sparse state representation
    • Prevents memory explosion and "character class too large" errors
    • Optimized range-based compilation for classes >256 runes
  • Bug #7: Case-insensitive character class matching [oO]+d didn't match "food"

    • compileLiteral() now respects FoldCase flag from regexp/syntax parser
    • ASCII letters create proper alternation between upper/lower variants
    • Fixes patterns like [oO], [aA][bB], etc.

Tests

  • Added comprehensive test suite nfa/compile_bug_test.go (402 lines, 33 test cases)
  • All tests passing with race detector enabled

Maintenance

  • Removed 21 unused linter directives (gosec, nestif)
  • Code formatting cleanup
  • golangci-lint: 0 issues

Thanks to Ben Hoyt (GoAWK) for reporting these bugs during integration testing!

Full changelog: CHANGELOG.md

v0.8.2 - Critical Bug Fix

Choose a tag to compare

@kolkov kolkov released this 03 Dec 09:47

Fixed

  • Critical: Infinite loop in onepass.Build() for patterns like (.*)
    • Bug: byte overflow when iterating ranges with hi=255 caused hang during compilation
    • Affected patterns: (.*), ^(.*)$, ([_a-zA-Z][_a-zA-Z0-9]*)=(.*)
    • Thanks to Ben Hoyt (GoAWK) for reporting!

Added

  • Longest() method: API compatibility with stdlib regexp.Regexp
  • QuoteMeta() function: Escape regex metacharacters in strings

Full Changelog: v0.8.1...v0.8.2

v0.8.1 - Stdlib Compatibility

Choose a tag to compare

@kolkov kolkov released this 03 Dec 04:09

Added

  • Type alias Regexp: Drop-in compatibility with stdlib regexp package

Now you can simply replace:

import "regexp"

with:

import regexp "github.com/coregx/coregex"

Existing code using *regexp.Regexp will work without changes.

Closes #5

v0.8.0 - ReverseInner Strategy (3000x+ speedup)

Choose a tag to compare

@kolkov kolkov released this 29 Nov 20:46

ReverseInner Strategy - 3000x+ Speedup for .*keyword.* Patterns

v0.8.0 introduces ReverseInner strategy with bidirectional DFA search, delivering 3,154x speedup for IsMatch and 2,857x speedup for Find on patterns like .*connection.*, .*database.*, .*error.*.

🚀 Performance Highlights

IsMatch (inner literal patterns):

  • .*connection.* 250KB: 3,154x faster (12.6ms → 4µs)
  • .*database.* 120KB: 1,174x faster
  • Many candidates (100 occurrences): 25x faster

Find (inner literal patterns):

  • .*connection.* 250KB: 1,894x faster (15.2ms → 8µs)
  • .*database.* 120KB: 2,857x faster (5.7ms → 2µs)
  • Many candidates (100 occurrences): 13x faster

✨ What's New

ReverseInner Strategy (OPT-010, OPT-012)

  • AST Splitting: Separate prefix/suffix NFAs for bidirectional search
  • Universal Match Detection: Skip DFA scans for .* prefix/suffix patterns
  • Early Return Optimization: First confirmed match is leftmost by construction
  • Prefilter + Bidirectional DFA: Inner literal prefilter → reverse DFA confirms prefix → forward DFA confirms suffix
  • Zero Allocations: All optimizations maintain zero-allocation hot paths

Also Included: v0.7.0 OnePass DFA

  • 10x faster captures (~700ns → 70ns for FindSubmatch)
  • Zero allocations (vs 2-4 allocs with PikeVM)
  • Automatically selected for simple anchored patterns

📦 Installation

go get github.com/coregx/coregex@v0.8.0

📚 Documentation

  • README - Updated with new benchmarks and architecture
  • CHANGELOG - Full v0.8.0 and v0.7.0 details

🎯 Use Cases

Perfect for:

  • Log parsing with patterns like .*ERROR.*connection.*
  • Text search with inner keywords: .*database.*, .*timeout.*
  • Multi-occurrence patterns in large files
  • Any pattern with wildcards before AND after a literal

🔧 Technical Details

  • New files: meta/reverse_inner.go, meta/reverse_inner_test.go
  • Modified: literal/extractor.go (AST splitting), meta/strategy.go, meta/meta.go
  • Code: +1,348 lines for ReverseInner
  • Tests: 7 new test suites, all passing
  • Coverage: 88.3% overall

Full Changelog: https://github.com/coregx/coregex/blob/v0.8.0/CHANGELOG.md

v0.7.1 - Go Proxy Fix

Choose a tag to compare

@kolkov kolkov released this 29 Nov 10:02

Minor release to update Go module proxy.

No code changes from v0.7.0 - only adds updated LICENSE and README.

For new users: Use v0.7.1 in go.mod.

Full changelog: v0.7.0...v0.7.1

v0.7.0 - OnePass DFA

Choose a tag to compare

@kolkov kolkov released this 29 Nov 10:02

OnePass DFA Engine (OPT-011)

10x faster capture group extraction for unambiguous patterns.

Highlights

  • 70 ns/op for FindSubmatch (was ~700 ns with PikeVM)
  • Zero allocations in hot path
  • Auto-fallback to PikeVM for non-onepass patterns

Benchmark

Operation PikeVM OnePass Speedup
FindSubmatch ~700 ns 70 ns 10x
IsMatch anchored ~50 ns 27 ns 1.8x
Memory allocations 2-3 0 Zero alloc

Full changelog: v0.6.0...v0.7.0

v0.6.0 - ReverseSuffix Optimization (1000x+ speedup)

Choose a tag to compare

@kolkov kolkov released this 28 Nov 11:48

What's New

OPT-009: Zero-allocation reverse DFA search for suffix patterns

This release introduces the ReverseSuffix optimization strategy for patterns like .*\.txt - a massive performance improvement for suffix-matching patterns.

Performance Improvements (vs stdlib)

Benchmark stdlib coregex Speedup
.*\.txt IsMatch 1KB 40,379 ns 307 ns 131x
.*\.txt IsMatch 32KB 1,323,614 ns 855 ns 1,549x
.*\.txt IsMatch 1MB 27,651,042 ns 21,041 ns 1,314x

Key Features

  • Zero heap allocations - Backward scan without byte reversal
  • Smart strategy selection - Prefers prefix literals when available
  • Greedy matching - Leftmost-longest semantics
  • DFA-only execution - No PikeVM fallback needed

Files Changed

  • New: meta/reverse_suffix.go - ReverseSuffix strategy implementation
  • New: dfa/lazy/lazy.go - SearchReverse() / IsMatchReverse() methods
  • Updated: meta/strategy.go - Strategy selection logic

Full Changelog

See CHANGELOG.md for complete details.

v0.5.0: Named Capture Groups

Choose a tag to compare

@kolkov kolkov released this 28 Nov 08:26

What's New in v0.5.0

Named Capture Groups ✨

Full support for (?P<name>...) syntax with stdlib-compatible API:

  • SubexpNames() API - Returns slice of capture group names
    • Available in Regex, Engine, and NFA
    • Index 0 is always "" (entire match)
    • Compatible with stdlib regexp.Regexp.SubexpNames() behavior

Implementation Details

  • NFA Compiler Enhancement: collectCaptureInfo() collects capture names during compilation
    • Two-pass algorithm: count captures, then collect names
    • Stores names from syntax.Regexp.Name field
  • Builder Enhancement: WithCaptureNames() BuildOption for passing names to NFA

Testing & Quality

  • 18 new unit tests covering all named capture scenarios
  • 2 integration examples demonstrating real-world usage
  • 88.3% test coverage
  • All CI checks passing

Files Changed

New files:

  • nfa/named_captures_test.go - Comprehensive test suite
  • example_subexpnames_test.go - Integration examples

Modified files:

  • nfa/nfa.go - Added captureNames field and SubexpNames() method
  • nfa/compile.go - Implemented collectCaptureInfo() two-pass algorithm
  • nfa/builder.go - Added WithCaptureNames() BuildOption
  • meta/meta.go - Exposed SubexpNames() through Engine
  • regex.go - Public API for SubexpNames()
  • README.md - Updated feature table and documentation
  • CHANGELOG.md - Added v0.5.0 release notes

Code statistics:

  • +333 lines, -20 lines (9 files changed)
  • +200 lines for named captures implementation

Full Changelog: v0.4.0...v0.5.0

v0.4.0: ReverseAnchored Strategy + Core Optimizations

Choose a tag to compare

@kolkov kolkov released this 28 Nov 07:17

Highlights

205,000x speedup for end-anchored patterns ($ anchor) through reverse search strategy.

What's New

Reverse Search Engine

  • nfa.Reverse() - Build reverse NFA from forward NFA
  • nfa.ReverseAnchored() - Build anchored reverse NFA for $ patterns
  • ReverseAnchoredSearcher - Optimized search for end-anchored patterns
  • Automatic strategy selection via IsPatternEndAnchored()

Core Optimizations (OPT-001..006)

  • Start State Caching - 6 start configurations with StartByteMap
  • Prefilter Effectiveness Tracking - Dynamic disabling at >90% false positives
  • Early Match Termination - searchEarliestMatch() for IsMatch
  • State Acceleration - memchr/memchr2/memchr3 in DFA loop
  • ByteClasses - Alphabet compression for reduced DFA states
  • Specialized Search Functions - Optimized Count/FindAllSubmatch

Bug Fixes

  • FIX-001: PikeVM visited check (prevents exponential thread explosion)
  • FIX-002: ReverseAnchored unanchored prefix bug (critical fix)

Performance

Pattern Before After Speedup
Easy1 $ anchor 1MB 340 sec 1.6 ms 205,000x
Case-insensitive 32KB - - 233x vs stdlib
Hard1 multi-alternation - - 5.2x vs stdlib

Installation

go get github.com/coregx/coregex@v0.4.0

Full Changelog

See CHANGELOG.md for details.