Skip to content

Commit 87d600b

Browse files
authored
Merge pull request #155 from coregx/release/v0.12.21
perf: v0.12.21 — tagged start states, zero-alloc API
2 parents 90d77fd + e147568 commit 87d600b

17 files changed

Lines changed: 782 additions & 361 deletions

CHANGELOG.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,62 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
- ARM NEON SIMD support (Go 1.26 `simd/archsimd` intrinsics — [#120](https://github.com/coregx/coregex/issues/120))
1313
- SIMD prefilter for CompositeSequenceDFA (#83)
1414

15+
## [0.12.21] - 2026-03-27
16+
17+
### Performance
18+
- **Tagged start states** (Rust `LazyStateID` approach) — start states get tag bit,
19+
always route to slow path. Enables prefilter skip-ahead only at start state,
20+
eliminating O(n²) from start state self-loop. Unlocks UseDFA for tiny NFA patterns.
21+
22+
- **DFA multiline $ fix** — EndLine look-ahead re-computation in determinize
23+
(Rust mod.rs:131-212). `(?m)hello$` now works correctly in DFA.
24+
25+
- **Dead-state prefilter restart** in searchEarliestMatch — IsMatch path uses
26+
prefilter to skip past dead states, matching Rust find_fwd_imp approach.
27+
28+
- **1100x fewer mallocs** — FindAllIndex/FindAllSubmatchIndex use flat buffer
29+
(`compactToSliceOfSlice`): N matches → 2 allocations instead of N+1.
30+
31+
- **Local SearchState cache** on Engine — atomic.Pointer single-slot cache
32+
survives GC, avoids sync.Pool re-allocation overhead.
33+
34+
- **Tiny NFA → UseDFA routing** — patterns with < 20 NFA states now use
35+
bidirectional DFA (was PikeVM). 7x faster DFA vs PikeVM on large inputs.
36+
37+
### Added
38+
- **`AllIndex(b []byte) iter.Seq[[2]int]`** — zero-alloc match index iterator (Go 1.23+)
39+
- **`AllStringIndex(s string) iter.Seq[[2]int]`** — string version
40+
- **`All(b []byte) iter.Seq[[]byte]`** — zero-alloc match content iterator
41+
- **`AllString(s string) iter.Seq[string]`** — string version
42+
- **`AppendAllIndex(dst [][2]int, b []byte, n int) [][2]int`** — buffer-reuse API
43+
- **`AppendAllStringIndex(dst [][2]int, s string, n int) [][2]int`** — string version
44+
45+
Naming follows Go proposal #61902 (regexp iterator methods) and `strconv.Append*` convention.
46+
47+
### Fixed
48+
- DFA `isMatchWithPrefilter` pfSkip off-by-one — `zx+` on "zzx" now correct
49+
- DFA multiline `$` EndLine look-ahead — `(?m)hello$` now matches before `\n`
50+
51+
### Benchmarks (LangArena LogParser, 7.2 MB, 13 patterns)
52+
53+
| Metric | v0.12.20 | v0.12.21 | Improvement |
54+
|--------|----------|----------|-------------|
55+
| Total time (FindAll) | 163ms | **107ms** | **-34%** |
56+
| errors pattern | 23ms | **8ms** (FindAll) / **5.5ms** (AllIndex) | **-65% / -76%** |
57+
| vs Rust gap | 3.9x | **2.9x** (FindAll) / **1.7x** (AllIndex) | **-56%** |
58+
| Mallocs/iter | 203K | **182** | **-99.9%** |
59+
60+
### Zero-Alloc API Benchmarks (new methods vs stdlib-compat)
61+
62+
| Method | errors (33K matches) | Alloc | vs Rust |
63+
|--------|---------------------|-------|---------|
64+
| FindAllStringIndex (stdlib) | 8.2ms / 3890 KB | 19 mallocs | 2.6x slower |
65+
| **AllIndex (iter.Seq)** | **5.9ms / 0 KB** | **0 mallocs** | **1.7x** |
66+
| **AppendAllIndex (reuse)** | **5.5ms / 0 KB** | **0 mallocs** | **1.7x** |
67+
| Rust find_iter | 3.2ms / 0 | 0 ||
68+
69+
emails pattern: `AppendAllIndex` **2.0ms vs Rust 2.6ms****faster than Rust!**
70+
1571
## [0.12.20] - 2026-03-25
1672

1773
### Performance

README.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ Cross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/ko
8383
- Multi-pattern (`foo|bar|baz|...`) — Slim Teddy (≤32), Fat Teddy (33-64), or Aho-Corasick (>64)
8484
- Anchored alternations (`^(\d+|UUID|hex32)`) — O(1) branch dispatch (5-20x)
8585
- Concatenated char classes (`[a-zA-Z]+[0-9]+`) — DFA with byte classes (5-7x)
86+
- **Zero-alloc iterators** (`AllIndex`, `AppendAllIndex`) — 0 heap allocs, up to **30% faster** than FindAll. Email pattern **faster than Rust** with `AppendAllIndex`.
8687

8788
## Features
8889

@@ -130,11 +131,28 @@ Supported methods:
130131
### Zero-Allocation APIs
131132

132133
```go
133-
// Zero allocations — returns bool
134+
// Zero allocations — boolean match
134135
matched := re.IsMatch(text)
135136

136-
// Zero allocations — returns (start, end, found)
137+
// Zero allocations — single match indices
137138
start, end, found := re.FindIndices(text)
139+
140+
// Zero allocations — iterator over all matches (Go 1.23+)
141+
for m := range re.AllIndex(data) {
142+
fmt.Printf("match at [%d, %d]\n", m[0], m[1])
143+
}
144+
145+
// Zero allocations — match content iterator
146+
for s := range re.AllString(text) {
147+
fmt.Println(s)
148+
}
149+
150+
// Buffer-reuse — append to caller's slice (strconv.Append* pattern)
151+
var buf [][2]int
152+
for _, chunk := range chunks {
153+
buf = re.AppendAllIndex(buf[:0], chunk, -1)
154+
process(buf)
155+
}
138156
```
139157

140158
### Configuration

ROADMAP.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,11 @@ v0.12.18 ✅ → Flat DFA transition table, integrated prefilter, PikeVM skip-ah
9797
9898
v0.12.19 ✅ → Zero-alloc FindSubmatch, byte-based DFA cache, Rust-aligned visited limits
9999
100-
v0.12.20 (Current) → Premultiplied/tagged StateIDs, break-at-match DFA determinize,
101-
Phase 3 elimination (2-pass bidirectional DFA)
100+
v0.12.20 ✅ → Premultiplied/tagged StateIDs, break-at-match DFA determinize,
101+
Phase 3 elimination (2-pass bidirectional DFA)
102+
103+
v0.12.21 (Current) → Tagged start states, zero-alloc API (AllIndex iter.Seq),
104+
1100x fewer mallocs, UseDFA for tiny NFA, -32% LangArena
102105
103106
v1.0.0-rc → Feature freeze, API locked
104107

dfa/lazy/builder.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ func (b *Builder) Build() (*DFA, error) {
6464
// Check if the NFA contains word boundary assertions
6565
hasWordBoundary := b.checkHasWordBoundary()
6666

67+
// Check if the NFA contains EndLine ($) assertions
68+
hasEndLine := b.checkHasEndLine()
69+
6770
// Check if the pattern is always anchored (has ^ prefix)
6871
isAlwaysAnchored := b.nfa.IsAlwaysAnchored()
6972

@@ -80,6 +83,7 @@ func (b *Builder) Build() (*DFA, error) {
8083
byteClasses: b.nfa.ByteClasses(),
8184
unanchoredStart: b.nfa.StartUnanchored(),
8285
hasWordBoundary: hasWordBoundary,
86+
hasEndLine: hasEndLine,
8387
isAlwaysAnchored: isAlwaysAnchored,
8488
startByteMap: startByteMap,
8589
}
@@ -706,3 +710,23 @@ func (b *Builder) checkHasWordBoundary() bool {
706710
}
707711
return false
708712
}
713+
714+
// checkHasEndLine checks if the NFA contains EndLine ($) look assertions.
715+
// When true, determinize performs look-ahead re-computation on '\n' bytes.
716+
// Computed once at DFA build time for O(1) check in hot loop.
717+
func (b *Builder) checkHasEndLine() bool {
718+
numStates := b.nfa.States()
719+
for i := nfa.StateID(0); int(i) < numStates; i++ {
720+
state := b.nfa.State(i)
721+
if state == nil {
722+
continue
723+
}
724+
if state.Kind() == nfa.StateLook {
725+
look, _ := state.Look()
726+
if look == nfa.LookEndLine {
727+
return true
728+
}
729+
}
730+
}
731+
return false
732+
}

0 commit comments

Comments
 (0)