Skip to content

Commit 90d77fd

Browse files
authored
Merge pull request #154 from coregx/release/v0.12.20
perf: v0.12.20 — premultiplied StateIDs, break-at-match
2 parents ab4039b + d22c05c commit 90d77fd

27 files changed

Lines changed: 712 additions & 617 deletions

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
- ARM NEON SIMD support (Go 1.26 `simd/archsimd` intrinsics — [#120](https://github.com/coregx/coregex/issues/120))
1313
- SIMD prefilter for CompositeSequenceDFA (#83)
1414

15+
## [0.12.20] - 2026-03-25
16+
17+
### Performance
18+
- **Premultiplied State IDs** — StateID stores byte offset into flat transition table,
19+
eliminating multiply from DFA hot loop. Single `flatTrans[sid+classIdx]` lookup.
20+
Inspired by Rust `LazyStateID` (hybrid/id.rs).
21+
22+
- **Tagged State IDs** — match/dead/invalid/start flags encoded in StateID high bits.
23+
Single `IsTagged()` branch replaces 3 separate comparisons in DFA hot loop.
24+
4x loop unrolling breaks to slow path only on tagged states.
25+
26+
- **1-byte match delay** (Rust determinize approach) — match reporting delayed by 1 byte,
27+
enabling correct look-around assertion resolution (^, $, \b) at match boundaries.
28+
Reference: Rust `determinize` mod.rs:254-286.
29+
30+
- **Rust-aligned DFA determinize: break-at-match** — replaced `filterStatesAfterMatch`
31+
with Rust's `determinize::next` break-at-match semantics (mod.rs:284). Epsilon closure
32+
uses add-on-pop DFS with reverse Split push, matching Rust sparse set insertion order.
33+
Incremental per-target epsilon closure preserves correct state ordering for leftmost-first.
34+
**Eliminates Phase 3** anchored re-scan: bidirectional DFA reduced from 3-pass to 2-pass.
35+
Verified against Rust regex-automata `find_fwd` — identical results on all test patterns.
36+
37+
- **Memmem: Memchr(rareByte) + verify** (Rust approach) — replaced `MemchrPair`-based
38+
paired search in `simd.Memmem` with single rare byte Memchr scan + `bytes.Equal`
39+
verify, matching Rust `memchr::memmem` architecture.
40+
41+
### Benchmarks (LangArena LogParser, 7.2 MB, 13 patterns)
42+
43+
| vs stdlib | vs Rust | Wins vs Rust |
44+
|-----------|---------|-------------|
45+
| **30x faster** total | 2-5x gap (local i7) | ip 18.5x, multiline_php 2.0x, char_class 1.3x |
46+
1547
## [0.12.19] - 2026-03-24
1648

1749
### Performance

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,19 +64,19 @@ Cross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/ko
6464

6565
| Pattern | Go stdlib | coregex | Rust regex | vs stdlib | vs Rust |
6666
|---------|-----------|---------|------------|-----------|---------|
67-
| Literal alternation | 475 ms | 4.4 ms | 0.7 ms | **109x** | 6.3x slower |
68-
| Multi-literal | 1391 ms | 12.6 ms | 4.7 ms | **110x** | 2.6x slower |
69-
| Inner `.*keyword.*` | 231 ms | 0.29 ms | 0.29 ms | **797x** | **~parity** |
70-
| Suffix `.*\.txt` | 234 ms | 1.83 ms | 1.07 ms | **128x** | 1.7x slower |
71-
| Multiline `(?m)^/.*\.php` | 103 ms | 0.66 ms | 0.66 ms | **156x** | **~parity** |
72-
| Email validation | 261 ms | 0.54 ms | 0.31 ms | **482x** | 1.7x slower |
73-
| URL extraction | 262 ms | 0.84 ms | 0.35 ms | **311x** | 2.4x slower |
74-
| IP address | 498 ms | 2.1 ms | 12.0 ms | **237x** | **5.6x faster** |
75-
| Char class `[\w]+` | 554 ms | 48.0 ms | 50.1 ms | **11x** | **1.0x faster** |
76-
| Word repeat `(\w{2,8})+` | 641 ms | 185 ms | 48.7 ms | **3x** | 3.7x slower |
67+
| Literal alternation | 466 ms | 4.2 ms | 0.65 ms | **110x** | 6.4x slower |
68+
| Multi-literal | 1391 ms | 12.4 ms | 5.3 ms | **112x** | 2.3x slower |
69+
| Inner `.*keyword.*` | 227 ms | 0.34 ms | 0.32 ms | **668x** | **~parity** |
70+
| Suffix `.*\.txt` | 228 ms | 2.9 ms | 1.3 ms | **78x** | 2.3x slower |
71+
| Multiline `(?m)^/.*\.php` | 101 ms | 0.35 ms | 0.72 ms | **288x** | **2.0x faster** |
72+
| Email validation | 258 ms | 0.51 ms | 0.27 ms | **506x** | 1.8x slower |
73+
| URL extraction | 259 ms | 0.71 ms | 0.35 ms | **364x** | 2.0x slower |
74+
| IP address | 493 ms | 0.73 ms | 13.5 ms | **675x** | **18.5x faster** |
75+
| Char class `[\w]+` | 483 ms | 40.9 ms | 56.0 ms | **11x** | **1.3x faster** |
76+
| Word repeat `(\w{2,8})+` | 628 ms | 167 ms | 54.8 ms | **3x** | 3.0x slower |
7777

7878
**Where coregex excels:**
79-
- Multiline patterns (`(?m)^/.*\.php`) — near Rust parity, 100x+ vs stdlib
79+
- Multiline patterns (`(?m)^/.*\.php`) — **2x faster than Rust**, 288x vs stdlib
8080
- IP/phone patterns (`\d+\.\d+\.\d+\.\d+`) — SIMD digit prefilter skips non-digit regions
8181
- Suffix patterns (`.*\.log`, `.*\.txt`) — reverse search optimization (1000x+)
8282
- Inner literals (`.*error.*`, `.*@example\.com`) — bidirectional DFA (900x+)

ROADMAP.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> **Strategic Focus**: Production-grade regex engine with RE2/rust-regex level optimizations
44
5-
**Last Updated**: 2026-03-24 | **Current Version**: v0.12.18 | **Target**: v1.0.0 stable
5+
**Last Updated**: 2026-03-25 | **Current Version**: v0.12.19 | **Target**: v1.0.0 stable
66

77
---
88

@@ -12,7 +12,7 @@ Build a **production-ready, high-performance regex engine** for Go that matches
1212

1313
### Current State vs Target
1414

15-
| Metric | Current (v0.12.15) | Target (v1.0.0) |
15+
| Metric | Current (v0.12.19) | Target (v1.0.0) |
1616
|--------|-------------------|-----------------|
1717
| Inner literal speedup | **280-3154x** | ✅ Achieved |
1818
| Case-insensitive speedup | **263x** | ✅ Achieved |
@@ -93,7 +93,12 @@ v0.12.16 ✅ → WrapLineAnchor for (?m)^ patterns
9393
9494
v0.12.17 ✅ → Fix LogParser ARM64 regression, restore DFA/Teddy for (?m)^
9595
96-
v0.12.18 (Current) ✅ → Flat DFA transition table, integrated prefilter, PikeVM skip-ahead
96+
v0.12.18 ✅ → Flat DFA transition table, integrated prefilter, PikeVM skip-ahead
97+
98+
v0.12.19 ✅ → Zero-alloc FindSubmatch, byte-based DFA cache, Rust-aligned visited limits
99+
100+
v0.12.20 (Current) → Premultiplied/tagged StateIDs, break-at-match DFA determinize,
101+
Phase 3 elimination (2-pass bidirectional DFA)
97102
98103
v1.0.0-rc → Feature freeze, API locked
99104

dfa/lazy/accel_test.go

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -98,18 +98,20 @@ func TestDetectAccelerationFromCached(t *testing.T) {
9898

9999
func TestDetectAccelerationFromFlat(t *testing.T) {
100100
// Test acceleration detection via flat transition table
101+
// Using premultiplied state IDs: sid = stateIndex * stride
101102
stride := 256
102-
sid := StateID(1)
103-
flatTrans := make([]StateID, 2*stride) // 2 states
103+
sid := StateID(1 * stride) // premultiplied: state 1 at offset 256
104+
state2 := StateID(2 * stride)
105+
flatTrans := make([]StateID, 3*stride) // 3 states (0, 1, 2)
104106

105107
// State 1: 250 self-loops, 3 exits to state 2, 3 dead
106-
base := int(sid) * stride
108+
base := sid.Offset()
107109
for i := 0; i < 250; i++ {
108110
flatTrans[base+i] = sid // Self-loop
109111
}
110-
flatTrans[base+250] = StateID(2)
111-
flatTrans[base+251] = StateID(2)
112-
flatTrans[base+252] = StateID(2)
112+
flatTrans[base+250] = state2
113+
flatTrans[base+251] = state2
114+
flatTrans[base+252] = state2
113115
flatTrans[base+253] = DeadState
114116
flatTrans[base+254] = DeadState
115117
flatTrans[base+255] = DeadState

dfa/lazy/anchored_search_prefilter_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ func TestFindWithPrefilterAtWordBoundary(t *testing.T) {
525525
// TestFindWithPrefilterAtCacheClear tests the cache-clear recovery path
526526
// in findWithPrefilterAt using a very small cache.
527527
func TestFindWithPrefilterAtCacheClear(t *testing.T) {
528-
config := DefaultConfig().WithMaxStates(3).WithMaxCacheClears(10)
528+
config := DefaultConfig().WithMaxStates(6).WithMaxCacheClears(20)
529529
compiler := nfa.NewDefaultCompiler()
530530
nfaObj, err := compiler.Compile("[a-zA-Z]+[0-9]+")
531531
if err != nil {

0 commit comments

Comments
 (0)