Skip to content

Releases: coregx/coregex

v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet

Choose a tag to compare

@kolkov kolkov released this 16 Feb 14:13
f6f6a76

Fixed

  • Alternation patterns misrouted to ReverseSuffixSet (Issue #116) — Patterns like [cgt]gggtaaa|tttaccc[acg] (alternation without .* prefix) were incorrectly routed to UseReverseSuffixSet strategy, producing wrong match boundaries. Fix: added isSafeForReverseSuffix guard.
  • matchStartZero optimization too aggressive — Restricted to .* prefix only via hasDotStarPrefix(). Was previously enabled for all unanchored patterns, causing wrong match start for patterns like [^\s]+\.txt.

Fixes #116

v0.12.1: Bidirectional DFA fallback, bounded repetitions fix

Choose a tag to compare

@kolkov kolkov released this 15 Feb 16:14
2516b6a

Performance

  • DFA bidirectional fallback for BoundedBacktracker — When BoundedBacktracker can't handle large inputs (exceeds 32M entry limit), use forward DFA + reverse DFA instead of PikeVM. O(n) total vs PikeVM's O(n×states). (\w{2,8})+ on 6MB: 654ms → 184ms (3.6x vs stdlib).
  • Digit-run skip optimization — For \d+-leading patterns (IP addresses, version numbers), skip entire digit run on DFA failure instead of advancing one byte at a time.

Bug Fixes

  • Bounded repetitions blocked ReverseSuffix strategy (#115) — isSafeForReverseSuffix didn't recognize OpRepeat{min>=1} as a wildcard, blocking UseReverseSuffix for patterns with bounded repetitions. Fix: 2500ms → 0.5ms (5000x) on 100KB no-match.
  • CompositeSequenceDFA overmatching for bounded patterns — Bare character classes like \w (maxMatch=1) were treated as unbounded by the DFA. \w\w on "000" returned "000" instead of "00".
  • AVX2 Teddy assembly correctness (#74) — Fixed teddySlimAVX2_2 returning position -1 for valid candidates in short haystacks.

Benchmarks (regex-bench, AMD EPYC, 6MB input)

Pattern Go stdlib coregex Rust regex vs stdlib
inner_literal 231 ms 0.25 ms 0.31 ms 926x
suffix 234 ms 0.89 ms 1.09 ms 263x
ip 507 ms 2.16 ms 12.05 ms 235x
char_class 560 ms 41 ms 50 ms 13.6x
word_repeat 654 ms 184 ms 49 ms 3.6x

Extreme benchmarks (6MB no-match): ip 2542x, suffix 1945x, phone 863x, inner 598x vs stdlib.

v0.12.0: Rust-inspired optimizations

Choose a tag to compare

@kolkov kolkov released this 06 Feb 01:15
a30fd70

Performance

  • Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
  • Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
  • Prefilter IsFast() gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists
  • DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA

Fixed

  • OnePass DFA capture limit — tighten from 17 to 16 capture groups (uint32 slot mask = 32 bits)

Benchmark (AMD EPYC, regex-bench)

Pattern coregex vs stdlib vs Rust
suffix 0.91ms 257x 1.4x faster
email 0.70ms 383x 1.9x faster
ip 2.19ms 225x 5.5x faster
uri 0.76ms 340x 1.2x faster
multiline_php 0.60ms 171x 1.2x faster
anchored_php 0.03ms ~1x 12.0x faster

v0.11.9: Fix missing first-byte prefilter in FindAll

Choose a tag to compare

@kolkov kolkov released this 01 Feb 21:33
8b528fa

Fixed

  • Missing first-byte prefilter in FindAll state-reusing path (#107)
    • findIndicesBoundedBacktrackerAtWithState was missing anchoredFirstBytes O(1) check
    • Pattern ^/.*[\w-]+\.php (without $) took 377ms instead of 40µs on 6MB input
    • Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)

Full Changelog

v0.11.8...v0.11.9

v0.11.8: Fix UseAnchoredLiteral regression

Choose a tag to compare

@kolkov kolkov released this 01 Feb 20:41
f0f527d

Fixed

  • Critical regression in UseAnchoredLiteral strategy (#107)
    • FindIndices* and findIndicesAtWithState were missing UseAnchoredLiteral case
    • Pattern ^/.*[\w-]+\.php$ fell through to slow NFA path
    • Regression: 0.01ms → 408ms (40,000x slower)
    • Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)

Full Changelog

v0.11.7...v0.11.8

v0.11.7: FindAll optimization - 1.08x faster than stdlib

Choose a tag to compare

@kolkov kolkov released this 01 Feb 19:50
1480f40

Fixed

FindAll now uses optimized state-reusing path

  • FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
  • Results for (\w{2,8})+ on 6MB: 2179ms → 834ms (2.6x faster)
  • Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)

Full Changelog

See CHANGELOG.md

v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib

Choose a tag to compare

@kolkov kolkov released this 01 Feb 18:56
fce1691

Performance

Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).

Key Changes

  • Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
  • SlotTable architecture: Rust-style per-state slot storage
  • Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
  • Lightweight searchThread: 16 bytes (was 40+ bytes)

Benchmark Results

Pattern (\w{2,8})+ vs stdlib:

Size Speedup
10KB 1.68x faster
50KB 1.88x faster
100KB 2.04x faster
1MB 1.67x faster
6MB 1.68x faster

6MB improvement: 1900ms → 628ms (3x faster)

Full Changelog

See CHANGELOG.md

v0.11.5: Fix checkHasWordBoundary catastrophic slowdown

Choose a tag to compare

@kolkov kolkov released this 01 Feb 09:46
de173be

Summary

Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).

Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)

Changes

Fixed

  • checkHasWordBoundary catastrophic slowdown (Issue #105)
    • Root cause: O(N*M) complexity from scanning all NFA states per byte
    • Fix: Use NewBuilderWithWordBoundary(), add hasWordBoundary guards, anchored prefilter verification

Performance

  • DFA state lookup: map → slice — 42% CPU time eliminated
  • Literal extraction from capture/repeat groups — better prefilters
    • =($\w...){2} now extracts =$ (2 bytes) instead of just =

Benchmarks (79KB input)

Stage Time vs stdlib
Before fix 3m 22s 7,000,000x slower
After fix 3.6 µs 8.6x faster

Credits

@danslo for root cause analysis and fix suggestions

Full Changelog: v0.11.4...v0.11.5

v0.11.4: FindAll multiline optimization

Choose a tag to compare

@kolkov kolkov released this 16 Jan 15:59
8baa0ef

Fixed

  • FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
    • FindIndicesAt() was missing dispatch for UseMultilineReverseSuffix
    • IsMatch/Find were fast (1µs), but FindAll was slow (78ms) — 100x gap vs Rust
    • After fix: FindAll on 6MB with 2000 matches: ~1ms (was 78ms)

Performance

Operation Before After Improvement
FindAll (6MB, 2000 matches) 78ms ~1ms 78x faster
vs Rust gap 100x slower ~1.3x slower Near parity!

Changed

  • Updated golang.org/x/sys v0.39.0 → v0.40.0

Full Changelog: v0.11.3...v0.11.4

v0.11.3: Prefix fast path 319-552x speedup

Choose a tag to compare

@kolkov kolkov released this 16 Jan 14:45
43efbbd

Performance

Pattern (?m)^/.*\.php now 319-552x faster than stdlib (was 3.5-5.7x in v0.11.1)

Operation coregex stdlib Speedup
IsMatch 182 ns 100 µs 552x
Find 240 ns 81 µs 338x
CountAll 58 µs 18.7 ms 319x

Algorithm

  1. Suffix prefilter finds .php candidates (SIMD memmem)
  2. SIMD backward scan to find line start (bytes.LastIndexByte)
  3. O(1) prefix byte check (/ at line start)
  4. Skip-to-next-line on mismatch (avoids O(n²) worst case)
  5. DFA fallback for complex patterns without extractable prefix

Changes

  • MultilineReverseSuffixSearcher.prefixBytes for O(1) verification
  • SetPrefixLiterals() extracts prefix from pattern
  • findLineStart() uses SIMD bytes.LastIndexByte
  • Skip-to-next-line: on prefix mismatch, jump to next \n position

Fixes #99