Skip to content

Releases: coregx/coregex

v0.3.0 - Replace and Split

Choose a tag to compare

@kolkov kolkov released this 27 Nov 17:17

What's New

Added

  • Replace functions: Full stdlib-compatible replacement API
    • ReplaceAll() / ReplaceAllString() - replace with template expansion
    • ReplaceAllLiteral() / ReplaceAllLiteralString() - literal replacement
    • ReplaceAllFunc() / ReplaceAllStringFunc() - replace with callback
  • Split function: Split(s string, n int) - split string by regex
  • Template expansion: $0-$9 backreference support in replacement
  • FindAllIndex: FindAllIndex() / FindAllStringIndex() for batch index retrieval

Technical

  • Pre-allocation optimization for replacement buffers
  • Proper $$ escape handling (literal $)
  • Empty match handling to prevent infinite loops

Performance

  • Case-insensitive 32KB: ~200x faster than stdlib
  • Case-insensitive 1KB: ~90x faster than stdlib

Full Changelog: v0.2.1...v0.3.0

v0.2.1: Documentation Hotfix

Choose a tag to compare

@kolkov kolkov released this 27 Nov 15:28

Documentation hotfix for v0.2.0 - updated README.md with correct performance numbers (263x) and feature table.

v0.2.0: Capture Groups Support

Choose a tag to compare

@kolkov kolkov released this 28 Nov 07:21

What's New

Capture Groups Support

Full submatch extraction via PikeVM:

  • FindSubmatch() / FindStringSubmatch() - returns all capture groups
  • FindSubmatchIndex() / FindStringSubmatchIndex() - returns group positions
  • NumSubexp() - returns number of capture groups

Example

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindStringSubmatch("user@example.com")
// match[0] = "user@example.com"
// match[1] = "user"
// match[2] = "example"
// match[3] = "com"

Performance

Pattern Size vs stdlib
Case-insensitive 32KB 263x faster
Case-insensitive 1KB 92x faster
Case-sensitive 1KB 3.5x faster

Technical Details

  • NFA StateCapture state type for group boundaries
  • Thread-local capture tracking in PikeVM with copy-on-write semantics
  • Captures follow Thompson's construction as epsilon transitions
  • DFA path unchanged - captures only allocated when requested

v0.1.4 - Documentation Update

Choose a tag to compare

@kolkov kolkov released this 27 Nov 14:12

What's Changed

Documentation Updates

  • Fixed broken benchmark/ link in README
  • Updated CHANGELOG with release notes for v0.1.1 through v0.1.4
  • Updated performance claims to reflect 143x speedup on case-insensitive patterns
  • Updated current version references throughout README

Performance Highlights

  • 143x faster than stdlib on case-insensitive patterns ((?i)...)
  • DFA prefilter working correctly after v0.1.3 cache fix

Full Changelog: v0.1.3...v0.1.4

v0.1.3 - Critical DFA Performance Fix

Choose a tag to compare

@kolkov kolkov released this 27 Nov 13:52

What's Fixed

Critical DFA Cache Bug

  • Problem: Start state ID was being overwritten by cache, causing EVERY DFA search to fall back to slow NFA
  • Impact: 200x performance regression in v0.1.0-v0.1.2 when using prefilter optimization
  • Solution: Preserve pre-assigned state IDs (StartState=0) in cache

Leftmost-Longest Semantics

  • Fixed DFA search to properly implement leftmost-longest match semantics
  • Now correctly returns first match position with greedy extension

Performance Improvements

Pattern Type Before Fix After Fix Improvement
Literal (32KB) 887,129 ns 4,375 ns 202x faster
Case-insensitive (32KB) 842,422 ns 5,883 ns 143x faster vs stdlib

Changelog

  • fix: DFA cache start state registration
  • fix: Leftmost-longest semantics in searchAt() and findWithPrefilter()
  • docs: Updated README with accurate benchmark data

Full Changelog: v0.1.2...v0.1.3

v0.1.2 - Strategy Selection & Match Bounds Fixes

Choose a tag to compare

@kolkov kolkov released this 27 Nov 13:21

Fixes

Strategy Selection Priority

  • Check for good literals BEFORE checking NFA size
  • Patterns with literals now use DFA+prefilter even if NFA < 20 states

Match Bounds Corrections

  • Complete prefilter matches now return correct bounds (was returning only first byte)
  • DFA matches now return correct start position (was always 0)

Testing

  • All tests pass
  • O(n) complexity verified for unanchored patterns

Full Changelog

v0.1.1...v0.1.2

v0.1.1 - Critical Hotfix: O(n²) PikeVM Bug

Choose a tag to compare

@kolkov kolkov released this 27 Nov 10:10

🔴 Critical Bug Fix

This hotfix resolves a critical performance bug in PikeVM unanchored search.

The Bug

PikeVM had O(n²) time complexity for unanchored patterns due to restarting search at each position.

Impact (before fix):

Input Size stdlib coregex Slowdown
16B 3.5 ns 3,768 ns 1,061x
32B 40 ns 11,797 ns 295x
1KB 263 ns 10.7 ms 40,775x
32KB 3.3 ms 11.2 sec 3,400,000x

The Fix

Implemented Thompson's parallel NFA simulation:

  • Add new start threads at each position (simulates `.*?` prefix)
  • Process all active threads in single O(n) pass
  • Implement leftmost-longest match semantics
  • Zero allocations in hot path

Performance after fix:

  • Consistent ~50-70 MB/s throughput for worst-case patterns
  • Linear O(n) time complexity verified by benchmarks
  • Zero allocations (0 B/op, 0 allocs/op)

Files Changed

  • `nfa/pikevm.go` - Core fix
  • `nfa/pikevm_bench_test.go` - Complexity verification benchmarks

Upgrade

```bash
go get github.com/coregx/coregex@v0.1.1
```

Full Changelog: v0.1.0...v0.1.1

v0.1.0 - Initial Release

Choose a tag to compare

@kolkov kolkov released this 27 Nov 08:59

coregex v0.1.0

Production-grade regex engine for Go with SIMD optimizations.

Features

  • Multi-engine architecture (NFA/DFA/Meta) with intelligent strategy selection
  • SIMD primitives (AVX2/SSSE3): memchr, memmem, Teddy multi-pattern search
  • Literal extraction and automatic prefilter selection
  • Lazy DFA with on-demand state construction
  • 5-50x faster than stdlib for patterns with literals
  • 88% test coverage, 0 linter issues

Installation

go get github.com/coregx/coregex

Quick Start

import "github.com/coregx/coregex"

re := coregex.MustCompile(`\w+@\w+\.\w+`)
match := re.Find([]byte("email: test@example.com"))

Status

⚠️ Experimental - API may change in future versions.

See README for full documentation.