Releases: coregx/coregex
Release list
v0.10.3: Critical capture group fix
Critical Bug Fix
FindStringSubmatch returned incorrect capture groups for .+ patterns
The Bug
re := coregex.MustCompile(`^(.+)-(\d+)$`)
matches := re.FindStringSubmatch("hello-123")
// Expected: matches[1] = "hello"
// Actual: matches[1] = "hello-123" ← BUGRoot Cause
StateSplit in PikeVM passed captures to both branches without cloning. The COW (Copy-on-Write) mechanism failed because refs=1, causing in-place modifications that corrupted the other branch's capture data.
Fix
Clone captures for the right branch in StateSplit to ensure refs>1, enabling proper copy-on-write semantics.
Affected Patterns
.+,.+?,.*,.*?in capture groups- Not affected: Explicit character classes (
[a-z]+,\w+)
Added
- 16 regression tests for
.+capture group scenarios - Stdlib compatibility verification tests
Fixes #77
v0.10.2: Version pattern hotfix
Fixed
- Version pattern regression (#75)
- Restored DigitPrefilter for digit-lead patterns like
\d+\.\d+\.\d+ - v0.10.1 incorrectly chose ReverseInner with "." as inner literal
- Performance restored: 8.2ms → 2.15ms (3.8x speedup)
- Restored DigitPrefilter for digit-lead patterns like
Changed
- Lint config: Added exclusion for unused AVX2 functions in
_amd64.gofiles
Full Changelog: v0.10.1...v0.10.2
v0.10.1: AVX2 Slim Teddy, Version Pattern Fix
Changes
- AVX2 Slim Teddy implementation (#69) - Available for direct benchmarks, uses shift algorithm from Rust aho-corasick
- Version pattern ReverseInner (#70) - Improved strategy selection for digit-lead patterns
- Optimization documentation (#71) - docs/OPTIMIZATIONS.md with 6 optimizations that beat Rust regex
Known Issues
- AVX2 Slim Teddy not enabled in integrated prefilter due to regression in high false-positive workloads (#74)
v0.10.0: Fat Teddy 16-bucket SIMD for 33-64 patterns
Summary
This release introduces Fat Teddy, a 16-bucket AVX2 SIMD prefilter for patterns with 33-64 literals, completing the multi-pattern search tier:
| Patterns | Engine | Throughput |
|---|---|---|
| 2-32 | Slim Teddy (SSSE3) | ~2 GB/s |
| 33-64 | Fat Teddy (AVX2) | 9+ GB/s |
| >64 | Aho-Corasick | ~150 MB/s |
Key Features
- Fat Teddy AVX2 implementation - 16 buckets (vs Slim Teddy's 8) = 2x pattern capacity
- 40x faster than scalar - AVX2 assembly with VPALIGNR half-shift algorithm
- Smart fallback for small haystacks - Aho-Corasick used for <64 bytes (2.4x faster)
- Pure Go scalar fallback - Works on non-AVX2 platforms
Performance
| Patterns | Engine | Throughput | vs Aho-Corasick |
|---|---|---|---|
| 40 | Fat Teddy AVX2 | 9.1 GB/s | 73x faster |
| 40 | Fat Teddy scalar | 228 MB/s | 1.5x faster |
| 70 | Aho-Corasick | 152 MB/s | baseline |
Small Haystack Optimization
- Before: ~267 ns/op (Fat Teddy on 37-byte input)
- After: ~110 ns/op (Aho-Corasick fallback)
- Improvement: 2.4x faster
Technical Details
AVX2 Algorithm (from Rust aho-corasick):
VBROADCASTI128: Load 16 bytes, duplicate to both 128-bit lanesVPSHUFB: Parallel nibble lookup in bucket masksVPALIGNR $15: Half-shift for 2-byte fingerprint alignmentVPMOVMSKB: Extract 32-bit candidate mask
Files Changed
prefilter/teddy_fat.go- Fat Teddy core implementationprefilter/teddy_avx2_amd64.s- AVX2 assembly (~300 lines)meta/meta.go- Aho-Corasick fallback for small haystacks- Documentation updates (README, CHANGELOG, ROADMAP)
Breaking Changes
None. This is a purely additive feature.
Full Changelog: v0.9.5...v0.10.0
v0.9.5: Teddy 8→32 patterns, literal extraction fix
Changes
- Teddy pattern limit expanded from 8 to 32 (#67)
- Slim Teddy now handles up to 32 patterns (was 8)
- Strategy threshold updated: Aho-Corasick triggers at >32 patterns (was >8)
- Follows Rust aho-corasick architecture
Fixed
- Literal extraction for factored prefixes (#67)
- Problem:
syntax.Parsefactors(Wanderlust|Weltanschauung)→W(anderlust|eltanschauung) - Caused wrong strategy selection: UseReverseSuffixSet instead of UseTeddy
- Benchmark fix: 376µs → 1.7µs (220x faster)
- Problem:
Install
go get github.com/coregx/coregex@v0.9.5v0.9.4: Streaming State Machine for CharClassSearcher
What's Changed
Changed
- Streaming state machine for CharClassSearcher - single-pass FindAll/Count
- New methods:
FindAllIndices(),Count()with streaming state machine - Eliminates per-match function call overhead
- Based on Rust regex approach: SEARCHING/MATCHING states
- Integrated into public API:
FindAll(),FindAllIndex()use streaming path
- New methods:
Performance
- CharClassFindAll: 15-30% faster (1500ns → 1100-1400ns on 1KB)
- char_class gap vs Rust: reduced from 2.6x to ~1.9x
- No regressions on other patterns (+0.05% geomean)
Full Changelog: v0.9.3...v0.9.4
v0.9.3: Teddy 2-byte fingerprint + strategy optimization
Summary
Optimize strategy selection and implement Teddy 2-byte fingerprint for reduced false positives.
Changes
Teddy 2-byte Fingerprint
- Changed default from 1-byte to 2-byte fingerprint
- New SSSE3 assembly:
teddySlimSSSE3_2 - Reduces false positives from ~25% to <0.5%
Strategy Selection Reorder
- DigitPrefilter now checked before tiny NFA fallback
- Added
isDigitLeadPattern()helper for digit-lead pattern detection - Prevents high-frequency literals (like
.) from being used as inner search targets
Performance
| Pattern | v0.9.2 | v0.9.3 | Change |
|---|---|---|---|
| literal_alt | 31ms | 8ms | +4x faster |
| version | 8.2ms | 2ms | +4x faster |
| IP | 3.9ms | 5.5ms | -43% (trade-off) |
Note: IP pattern is 43% slower but remains 2.2x faster than Rust regex. See #62 for future optimization research.
Full Changelog
https://github.com/coregx/coregex/blob/main/CHANGELOG.md#093---2026-01-06
v0.9.2: Simplified DigitPrefilter (146x IP speedup)
What's Changed
Replaced adaptive switching approach from v0.9.1 with a simpler and faster solution.
Background
v0.9.1 added runtime adaptive switching to handle dense digit data. Testing revealed that:
- Adaptive tracking itself added overhead (~50ms on 6MB)
- Complex patterns (like IP with 74 NFA states) are better served by pure DFA
New Approach
Instead of runtime adaptation, we now use compile-time strategy selection:
- Simple digit patterns (≤100 NFA states) → DigitPrefilter
- Complex digit patterns (>100 NFA states) → LazyDFA
This eliminates runtime overhead while achieving better performance.
Performance Improvements
| Pattern | v0.9.1 | v0.9.2 | Speedup |
|---|---|---|---|
| IP | 731ms | 5ms | 146x |
| char_class | 183ms | 113ms | 1.6x |
| literal_alt | 61ms | 29ms | 2.1x |
Changes
- Remove
digitPrefilterAdaptiveThreshold(runtime tracking) - Add
digitPrefilterMaxNFAStates=100(compile-time limit) - Add
PikeVM.SearchBetweenfor bounded search optimization - Update benchmarks in README
Full Changelog: v0.9.1...v0.9.2
v0.9.1: DigitPrefilter Adaptive Switching
Fixed
DigitPrefilter adaptive switching for high false-positive scenarios
- Problem: DigitPrefilter was slow on dense digit data (many consecutive FPs)
- Solution: Runtime adaptive switching - after 64 consecutive false positives, switch to DFA
- Based on Rust regex insight: "prefilter with high FP rate makes search slower"
Performance (IP regex benchmarks)
| Scenario | stdlib | coregex | Speedup |
|---|---|---|---|
| Sparse 64KB | 833 µs | 2.8 µs | 300x |
| Dense 64KB | 8.5 µs | 2.4 µs | 3.5x |
| No IPs 1MB | 60.7 ms | 19.8 µs | 3000x |
Details
- Sparse data: prefilter remains fast (100-3000x speedup via SIMD skip)
- Dense data: adaptively switches to lazy DFA (3-5x speedup vs stdlib)
- New stat:
Stats.PrefilterAbandonedtracks adaptive switching events - New constant:
digitPrefilterAdaptiveThreshold = 64
Full Changelog: v0.9.0...v0.9.1
v0.9.0: UseAhoCorasick, DigitPrefilter, Paired-byte SIMD
Highlights
UseAhoCorasick Strategy
- Large literal alternations (>8 patterns) via
github.com/coregx/ahocorasick - 75-113x faster than stdlib on 15-20 pattern alternations
- O(n) multi-pattern matching with ~1.6 GB/s throughput
DigitPrefilter Strategy (#56)
- AVX2 SIMD digit scanner for IP regex patterns
- 2500x faster on no-match scenarios
- 39-152x faster on sparse IP data
Paired-byte SIMD Search (#55)
- Byte frequency analysis for optimal rare byte selection
- AVX2
MemchrPair()searches two bytes simultaneously - Dramatically reduces false positives
Installation
go get github.com/coregx/coregex@v0.9.0See CHANGELOG.md for full details.