Releases: coregx/coregex
Release list
v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet
Fixed
- Alternation patterns misrouted to ReverseSuffixSet (Issue #116) — Patterns like
[cgt]gggtaaa|tttaccc[acg](alternation without.*prefix) were incorrectly routed to UseReverseSuffixSet strategy, producing wrong match boundaries. Fix: addedisSafeForReverseSuffixguard. matchStartZerooptimization too aggressive — Restricted to.*prefix only viahasDotStarPrefix(). Was previously enabled for all unanchored patterns, causing wrong match start for patterns like[^\s]+\.txt.
Fixes #116
v0.12.1: Bidirectional DFA fallback, bounded repetitions fix
Performance
- DFA bidirectional fallback for BoundedBacktracker — When BoundedBacktracker can't handle large inputs (exceeds 32M entry limit), use forward DFA + reverse DFA instead of PikeVM. O(n) total vs PikeVM's O(n×states).
(\w{2,8})+on 6MB: 654ms → 184ms (3.6x vs stdlib). - Digit-run skip optimization — For
\d+-leading patterns (IP addresses, version numbers), skip entire digit run on DFA failure instead of advancing one byte at a time.
Bug Fixes
- Bounded repetitions blocked ReverseSuffix strategy (#115) —
isSafeForReverseSuffixdidn't recognizeOpRepeat{min>=1}as a wildcard, blocking UseReverseSuffix for patterns with bounded repetitions. Fix: 2500ms → 0.5ms (5000x) on 100KB no-match. - CompositeSequenceDFA overmatching for bounded patterns — Bare character classes like
\w(maxMatch=1) were treated as unbounded by the DFA.\w\won "000" returned "000" instead of "00". - AVX2 Teddy assembly correctness (#74) — Fixed
teddySlimAVX2_2returning position -1 for valid candidates in short haystacks.
Benchmarks (regex-bench, AMD EPYC, 6MB input)
| Pattern | Go stdlib | coregex | Rust regex | vs stdlib |
|---|---|---|---|---|
| inner_literal | 231 ms | 0.25 ms | 0.31 ms | 926x |
| suffix | 234 ms | 0.89 ms | 1.09 ms | 263x |
| ip | 507 ms | 2.16 ms | 12.05 ms | 235x |
| char_class | 560 ms | 41 ms | 50 ms | 13.6x |
| word_repeat | 654 ms | 184 ms | 49 ms | 3.6x |
Extreme benchmarks (6MB no-match): ip 2542x, suffix 1945x, phone 863x, inner 598x vs stdlib.
v0.12.0: Rust-inspired optimizations
Performance
- Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
- Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
- Prefilter
IsFast()gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists - DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA
Fixed
- OnePass DFA capture limit — tighten from 17 to 16 capture groups (
uint32slot mask = 32 bits)
Benchmark (AMD EPYC, regex-bench)
| Pattern | coregex | vs stdlib | vs Rust |
|---|---|---|---|
| suffix | 0.91ms | 257x | 1.4x faster |
| 0.70ms | 383x | 1.9x faster | |
| ip | 2.19ms | 225x | 5.5x faster |
| uri | 0.76ms | 340x | 1.2x faster |
| multiline_php | 0.60ms | 171x | 1.2x faster |
| anchored_php | 0.03ms | ~1x | 12.0x faster |
v0.11.9: Fix missing first-byte prefilter in FindAll
Fixed
- Missing first-byte prefilter in FindAll state-reusing path (#107)
findIndicesBoundedBacktrackerAtWithStatewas missinganchoredFirstBytesO(1) check- Pattern
^/.*[\w-]+\.php(without$) took 377ms instead of 40µs on 6MB input - Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)
Full Changelog
v0.11.8: Fix UseAnchoredLiteral regression
Fixed
- Critical regression in UseAnchoredLiteral strategy (#107)
FindIndices*andfindIndicesAtWithStatewere missingUseAnchoredLiteralcase- Pattern
^/.*[\w-]+\.php$fell through to slow NFA path - Regression: 0.01ms → 408ms (40,000x slower)
- Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)
Full Changelog
v0.11.7: FindAll optimization - 1.08x faster than stdlib
Fixed
FindAll now uses optimized state-reusing path
- FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
- Results for
(\w{2,8})+on 6MB: 2179ms → 834ms (2.6x faster) - Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)
Full Changelog
See CHANGELOG.md
v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib
Performance
Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).
Key Changes
- Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
- SlotTable architecture: Rust-style per-state slot storage
- Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
- Lightweight searchThread: 16 bytes (was 40+ bytes)
Benchmark Results
Pattern (\w{2,8})+ vs stdlib:
| Size | Speedup |
|---|---|
| 10KB | 1.68x faster |
| 50KB | 1.88x faster |
| 100KB | 2.04x faster |
| 1MB | 1.67x faster |
| 6MB | 1.68x faster |
6MB improvement: 1900ms → 628ms (3x faster)
Full Changelog
See CHANGELOG.md
v0.11.5: Fix checkHasWordBoundary catastrophic slowdown
Summary
Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).
Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)
Changes
Fixed
- checkHasWordBoundary catastrophic slowdown (Issue #105)
- Root cause: O(N*M) complexity from scanning all NFA states per byte
- Fix: Use
NewBuilderWithWordBoundary(), addhasWordBoundaryguards, anchored prefilter verification
Performance
- DFA state lookup: map → slice — 42% CPU time eliminated
- Literal extraction from capture/repeat groups — better prefilters
=($\w...){2}now extracts=$(2 bytes) instead of just=
Benchmarks (79KB input)
| Stage | Time | vs stdlib |
|---|---|---|
| Before fix | 3m 22s | 7,000,000x slower |
| After fix | 3.6 µs | 8.6x faster |
Credits
@danslo for root cause analysis and fix suggestions
Full Changelog: v0.11.4...v0.11.5
v0.11.4: FindAll multiline optimization
Fixed
- FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
FindIndicesAt()was missing dispatch forUseMultilineReverseSuffixIsMatch/Findwere fast (1µs), butFindAllwas slow (78ms) — 100x gap vs Rust- After fix:
FindAllon 6MB with 2000 matches: ~1ms (was 78ms)
Performance
| Operation | Before | After | Improvement |
|---|---|---|---|
| FindAll (6MB, 2000 matches) | 78ms | ~1ms | 78x faster |
| vs Rust gap | 100x slower | ~1.3x slower | Near parity! |
Changed
- Updated
golang.org/x/sysv0.39.0 → v0.40.0
Full Changelog: v0.11.3...v0.11.4
v0.11.3: Prefix fast path 319-552x speedup
Performance
Pattern (?m)^/.*\.php now 319-552x faster than stdlib (was 3.5-5.7x in v0.11.1)
| Operation | coregex | stdlib | Speedup |
|---|---|---|---|
| IsMatch | 182 ns | 100 µs | 552x |
| Find | 240 ns | 81 µs | 338x |
| CountAll | 58 µs | 18.7 ms | 319x |
Algorithm
- Suffix prefilter finds
.phpcandidates (SIMD memmem) - SIMD backward scan to find line start (
bytes.LastIndexByte) - O(1) prefix byte check (
/at line start) - Skip-to-next-line on mismatch (avoids O(n²) worst case)
- DFA fallback for complex patterns without extractable prefix
Changes
MultilineReverseSuffixSearcher.prefixBytesfor O(1) verificationSetPrefixLiterals()extracts prefix from patternfindLineStart()uses SIMDbytes.LastIndexByte- Skip-to-next-line: on prefix mismatch, jump to next
\nposition
Fixes #99