Releases · coregx/coregex

Release list

v0.10.3: Critical capture group fix

kolkov released this 08 Jan 17:29

v0.10.3

02d3e30

Critical Bug Fix

FindStringSubmatch returned incorrect capture groups for .+ patterns

The Bug

re := coregex.MustCompile(`^(.+)-(\d+)$`)
matches := re.FindStringSubmatch("hello-123")
// Expected: matches[1] = "hello"
// Actual:   matches[1] = "hello-123" ← BUG

Root Cause

StateSplit in PikeVM passed captures to both branches without cloning. The COW (Copy-on-Write) mechanism failed because refs=1, causing in-place modifications that corrupted the other branch's capture data.

Fix

Clone captures for the right branch in StateSplit to ensure refs>1, enabling proper copy-on-write semantics.

Affected Patterns

.+, .+?, .*, .*? in capture groups
Not affected: Explicit character classes ([a-z]+, \w+)

Added

16 regression tests for .+ capture group scenarios
Stdlib compatibility verification tests

Fixes #77

Assets 2

v0.10.2: Version pattern hotfix

kolkov released this 07 Jan 11:29

v0.10.2

d4ad6f9

Fixed

Version pattern regression (#75)
- Restored DigitPrefilter for digit-lead patterns like \d+\.\d+\.\d+
- v0.10.1 incorrectly chose ReverseInner with "." as inner literal
- Performance restored: 8.2ms → 2.15ms (3.8x speedup)

Changed

Lint config: Added exclusion for unused AVX2 functions in _amd64.go files

Full Changelog: v0.10.1...v0.10.2

Assets 2

v0.10.1: AVX2 Slim Teddy, Version Pattern Fix

kolkov released this 07 Jan 04:18

v0.10.1

47b42ee

Changes

AVX2 Slim Teddy implementation (#69) - Available for direct benchmarks, uses shift algorithm from Rust aho-corasick
Version pattern ReverseInner (#70) - Improved strategy selection for digit-lead patterns
Optimization documentation (#71) - docs/OPTIMIZATIONS.md with 6 optimizations that beat Rust regex

Known Issues

AVX2 Slim Teddy not enabled in integrated prefilter due to regression in high false-positive workloads (#74)

Assets 2

v0.10.0: Fat Teddy 16-bucket SIMD for 33-64 patterns

kolkov released this 06 Jan 22:35

v0.10.0

ad4498b

Summary

This release introduces Fat Teddy, a 16-bucket AVX2 SIMD prefilter for patterns with 33-64 literals, completing the multi-pattern search tier:

Patterns	Engine	Throughput
2-32	Slim Teddy (SSSE3)	~2 GB/s
33-64	Fat Teddy (AVX2)	9+ GB/s
>64	Aho-Corasick	~150 MB/s

Key Features

Fat Teddy AVX2 implementation - 16 buckets (vs Slim Teddy's 8) = 2x pattern capacity
40x faster than scalar - AVX2 assembly with VPALIGNR half-shift algorithm
Smart fallback for small haystacks - Aho-Corasick used for <64 bytes (2.4x faster)
Pure Go scalar fallback - Works on non-AVX2 platforms

Performance

Patterns	Engine	Throughput	vs Aho-Corasick
40	Fat Teddy AVX2	9.1 GB/s	73x faster
40	Fat Teddy scalar	228 MB/s	1.5x faster
70	Aho-Corasick	152 MB/s	baseline

Small Haystack Optimization

Before: ~267 ns/op (Fat Teddy on 37-byte input)
After: ~110 ns/op (Aho-Corasick fallback)
Improvement: 2.4x faster

Technical Details

AVX2 Algorithm (from Rust aho-corasick):

VBROADCASTI128: Load 16 bytes, duplicate to both 128-bit lanes
VPSHUFB: Parallel nibble lookup in bucket masks
VPALIGNR $15: Half-shift for 2-byte fingerprint alignment
VPMOVMSKB: Extract 32-bit candidate mask

Files Changed

prefilter/teddy_fat.go - Fat Teddy core implementation
prefilter/teddy_avx2_amd64.s - AVX2 assembly (~300 lines)
meta/meta.go - Aho-Corasick fallback for small haystacks
Documentation updates (README, CHANGELOG, ROADMAP)

Breaking Changes

None. This is a purely additive feature.

Full Changelog: v0.9.5...v0.10.0

Assets 2

v0.9.5: Teddy 8→32 patterns, literal extraction fix

kolkov released this 06 Jan 19:45

v0.9.5

6b02713

Changes

Teddy pattern limit expanded from 8 to 32 (#67)
- Slim Teddy now handles up to 32 patterns (was 8)
- Strategy threshold updated: Aho-Corasick triggers at >32 patterns (was >8)
- Follows Rust aho-corasick architecture

Fixed

Literal extraction for factored prefixes (#67)
- Problem: syntax.Parse factors (Wanderlust|Weltanschauung) → W(anderlust|eltanschauung)
- Caused wrong strategy selection: UseReverseSuffixSet instead of UseTeddy
- Benchmark fix: 376µs → 1.7µs (220x faster)

Install

go get github.com/coregx/coregex@v0.9.5

Assets 2

v0.9.4: Streaming State Machine for CharClassSearcher

kolkov released this 06 Jan 14:32

v0.9.4

0d0785b

What's Changed

Changed

Streaming state machine for CharClassSearcher - single-pass FindAll/Count
- New methods: FindAllIndices(), Count() with streaming state machine
- Eliminates per-match function call overhead
- Based on Rust regex approach: SEARCHING/MATCHING states
- Integrated into public API: FindAll(), FindAllIndex() use streaming path

Performance

CharClassFindAll: 15-30% faster (1500ns → 1100-1400ns on 1KB)
char_class gap vs Rust: reduced from 2.6x to ~1.9x
No regressions on other patterns (+0.05% geomean)

Full Changelog: v0.9.3...v0.9.4

Assets 2

v0.9.3: Teddy 2-byte fingerprint + strategy optimization

kolkov released this 06 Jan 12:42

v0.9.3

5f8187c

Summary

Optimize strategy selection and implement Teddy 2-byte fingerprint for reduced false positives.

Changes

Teddy 2-byte Fingerprint

Changed default from 1-byte to 2-byte fingerprint
New SSSE3 assembly: teddySlimSSSE3_2
Reduces false positives from ~25% to <0.5%

Strategy Selection Reorder

DigitPrefilter now checked before tiny NFA fallback
Added isDigitLeadPattern() helper for digit-lead pattern detection
Prevents high-frequency literals (like .) from being used as inner search targets

Performance

Pattern	v0.9.2	v0.9.3	Change
literal_alt	31ms	8ms	+4x faster
version	8.2ms	2ms	+4x faster
IP	3.9ms	5.5ms	-43% (trade-off)

Note: IP pattern is 43% slower but remains 2.2x faster than Rust regex. See #62 for future optimization research.

Full Changelog

https://github.com/coregx/coregex/blob/main/CHANGELOG.md#093---2026-01-06

Assets 2

v0.9.2: Simplified DigitPrefilter (146x IP speedup)

kolkov released this 06 Jan 10:59

v0.9.2

30dbd01

What's Changed

Replaced adaptive switching approach from v0.9.1 with a simpler and faster solution.

Background

v0.9.1 added runtime adaptive switching to handle dense digit data. Testing revealed that:

Adaptive tracking itself added overhead (~50ms on 6MB)
Complex patterns (like IP with 74 NFA states) are better served by pure DFA

New Approach

Instead of runtime adaptation, we now use compile-time strategy selection:

Simple digit patterns (≤100 NFA states) → DigitPrefilter
Complex digit patterns (>100 NFA states) → LazyDFA

This eliminates runtime overhead while achieving better performance.

Performance Improvements

Pattern	v0.9.1	v0.9.2	Speedup
IP	731ms	5ms	146x
char_class	183ms	113ms	1.6x
literal_alt	61ms	29ms	2.1x

Changes

Remove digitPrefilterAdaptiveThreshold (runtime tracking)
Add digitPrefilterMaxNFAStates=100 (compile-time limit)
Add PikeVM.SearchBetween for bounded search optimization
Update benchmarks in README

Full Changelog: v0.9.1...v0.9.2

Assets 2

v0.9.1: DigitPrefilter Adaptive Switching

kolkov released this 05 Jan 01:10

v0.9.1

d5c6862

Fixed

DigitPrefilter adaptive switching for high false-positive scenarios

Problem: DigitPrefilter was slow on dense digit data (many consecutive FPs)
Solution: Runtime adaptive switching - after 64 consecutive false positives, switch to DFA
Based on Rust regex insight: "prefilter with high FP rate makes search slower"

Performance (IP regex benchmarks)

Scenario	stdlib	coregex	Speedup
Sparse 64KB	833 µs	2.8 µs	300x
Dense 64KB	8.5 µs	2.4 µs	3.5x
No IPs 1MB	60.7 ms	19.8 µs	3000x

Details

Sparse data: prefilter remains fast (100-3000x speedup via SIMD skip)
Dense data: adaptively switches to lazy DFA (3-5x speedup vs stdlib)
New stat: Stats.PrefilterAbandoned tracks adaptive switching events
New constant: digitPrefilterAdaptiveThreshold = 64

Full Changelog: v0.9.0...v0.9.1

Assets 2

v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD

kolkov released this 04 Jan 22:58

v0.9.0

e09f196

Highlights

UseAhoCorasick Strategy

Large literal alternations (>8 patterns) via github.com/coregx/ahocorasick
75-113x faster than stdlib on 15-20 pattern alternations
O(n) multi-pattern matching with ~1.6 GB/s throughput

DigitPrefilter Strategy (#56)

AVX2 SIMD digit scanner for IP regex patterns
2500x faster on no-match scenarios
39-152x faster on sparse IP data

Paired-byte SIMD Search (#55)

Byte frequency analysis for optimal rare byte selection
AVX2 MemchrPair() searches two bytes simultaneously
Dramatically reduces false positives

Installation

go get github.com/coregx/coregex@v0.9.0

See CHANGELOG.md for full details.

Assets 2

Uh oh!

Uh oh!

Releases: coregx/coregex

Release list

v0.10.3: Critical capture group fix

Critical Bug Fix

The Bug

Root Cause

Fix

Affected Patterns

Added

Uh oh!

v0.10.2: Version pattern hotfix

Fixed

Changed

Uh oh!

v0.10.1: AVX2 Slim Teddy, Version Pattern Fix

Changes

Known Issues

Uh oh!

v0.10.0: Fat Teddy 16-bucket SIMD for 33-64 patterns

Summary

Key Features

Performance

Small Haystack Optimization

Technical Details

Files Changed

Breaking Changes

Uh oh!

v0.9.5: Teddy 8→32 patterns, literal extraction fix

Changes

Fixed

Install

Uh oh!

v0.9.4: Streaming State Machine for CharClassSearcher

What's Changed

Changed

Performance

Uh oh!

v0.9.3: Teddy 2-byte fingerprint + strategy optimization

Summary

Changes

Teddy 2-byte Fingerprint

Strategy Selection Reorder

Performance

Full Changelog

Uh oh!

v0.9.2: Simplified DigitPrefilter (146x IP speedup)

What's Changed

Background

New Approach

Performance Improvements

Changes

Uh oh!

v0.9.1: DigitPrefilter Adaptive Switching

Fixed

Performance (IP regex benchmarks)

Details

Uh oh!

v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD

Highlights

UseAhoCorasick Strategy

DigitPrefilter Strategy (#56)

Paired-byte SIMD Search (#55)

Installation

Uh oh!