Skip to content

Commit bc78fa7

Browse files
authored
fix: LogParser 7x ARM64 regression — restore DFA for (?m)^ (#149)
* fix: remove false DFA downgrade for (?m)^ multiline patterns adjustForAnchors() incorrectly routed (?m)^ patterns from UseDFA to UseNFA, claiming DFA can't verify multiline line anchors. The lazy DFA already handles (?m)^ correctly via StartByteMap/StartLineLF — identical to Rust regex approach. Verified with direct DFA tests. The downgrade caused 4 LangArena patterns (api_calls, post_requests, passwords, sessions) to use byte-by-byte NFA scan instead of DFA — catastrophic on ARM64 without SIMD (LogParser 2s -> 14s on M1). LangArena total: 2335ms -> 185ms (12.6x faster). Root cause: docs/dev/research/v01216-arm64-regression.md * fix: restore partial prefilter for (?i) alternation overflow + guard NFA candidate loop Literal extractor returned empty Seq on cross-product overflow (>250), killing all prefilter literals for patterns like (?i)(eval|system|exec|...). Now trims to 3-byte prefixes + dedup (Rust approach), marks inexact. Also guards NFA candidate loop with IsComplete() — incomplete prefilters cannot be used as correctness gates (would miss branches whose literals were truncated). NFA falls through to full scan instead. suspicious: UseNFA without prefilter (113ms) -> with FatTeddy (1ms). Stdlib compat: 38/38 PASS. * fix: restore UseTeddy for (?m)^ patterns — lineAnchorWrapper makes it safe selectLiteralStrategy blocked UseTeddy for any pattern with anchors, but adjustForAnchors already wraps prefilter with WrapLineAnchor for (?m)^. Added hasNonLineAnchors to allow UseTeddy when anchors are only (?m)^. http_methods on macOS ARM64: 89ms -> <1ms (restored to v0.12.14 level). Stdlib compat: 38/38 PASS. * docs: update CHANGELOG and README for v0.12.17
1 parent afd9d8d commit bc78fa7

6 files changed

Lines changed: 73 additions & 46 deletions

File tree

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
- ARM NEON SIMD support (Go 1.26 `simd/archsimd` intrinsics — [#120](https://github.com/coregx/coregex/issues/120))
1313
- SIMD prefilter for CompositeSequenceDFA (#83)
1414

15+
## [0.12.17] - 2026-03-23
16+
17+
### Fixed
18+
- **Remove false DFA downgrade for `(?m)^` patterns**`adjustForAnchors()`
19+
incorrectly routed `(?m)^` multiline patterns from UseDFA to UseNFA, claiming
20+
"DFA can't verify multiline line anchors". This is false — the lazy DFA already
21+
handles `(?m)^` correctly via StartByteMap/StartLineLF (identical to Rust regex).
22+
The downgrade caused 4 LangArena patterns (`api_calls`, `post_requests`,
23+
`passwords`, `sessions`) to fall back to byte-by-byte NFA scan — catastrophic
24+
on ARM64 without SIMD. LangArena total: 2335ms → **185ms** (12.6x faster).
25+
26+
- **Restore partial prefilter for `(?i)` alternation overflow** — literal
27+
extractor returned empty Seq on cross-product overflow (>250 variants),
28+
killing all prefilter literals for patterns like `(?i)(eval|system|exec|...)`.
29+
Now trims to 3-byte prefixes + dedup (Rust approach) and marks inexact.
30+
Also guards NFA candidate loop with `IsComplete()` check — incomplete
31+
prefilters skip candidate loop (NFA scans full input), preventing
32+
correctness bugs from partial branch coverage.
33+
`suspicious` pattern: UseNFA without prefilter (113ms) → UseNFA with
34+
FatTeddy skip-ahead (**1ms**).
35+
36+
- **Restore UseTeddy for `(?m)^` multiline patterns**`selectLiteralStrategy`
37+
blocked UseTeddy for any pattern with anchors. But `adjustForAnchors()` already
38+
wraps the prefilter with `WrapLineAnchor` for `(?m)^`, making Teddy safe.
39+
Now allows UseTeddy when anchors are only `(?m)^` (no \b, $, etc).
40+
`http_methods` on macOS ARM64: 89ms → **<1ms** (restored to v0.12.14 level).
41+
1542
## [0.12.16] - 2026-03-21
1643

1744
### Performance

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,14 @@ Cross-language benchmarks on 6MB input, AMD EPYC ([source](https://github.com/ko
6666
|---------|-----------|---------|------------|-----------|---------|
6767
| Literal alternation | 475 ms | 4.4 ms | 0.6 ms | **108x** | 7.1x slower |
6868
| Multi-literal | 1412 ms | 12.8 ms | 4.7 ms | **110x** | 2.7x slower |
69-
| Inner `.*keyword.*` | 234 ms | 0.35 ms | 0.28 ms | **667x** | 1.2x slower |
70-
| Suffix `.*\.txt` | 236 ms | 1.83 ms | 1.08 ms | **128x** | 1.6x slower |
71-
| Multiline `(?m)^/.*\.php` | 104 ms | 0.50 ms | 0.68 ms | **207x** | **1.3x faster** |
72-
| Email validation | 262 ms | 0.50 ms | 0.23 ms | **523x** | 2.1x slower |
73-
| URL extraction | 258 ms | 0.61 ms | 0.35 ms | **422x** | 1.7x slower |
74-
| IP address | 497 ms | 2.2 ms | 12.0 ms | **229x** | **5.5x faster** |
75-
| Char class `[\w]+` | 579 ms | 41.0 ms | 50.1 ms | **14x** | **1.2x faster** |
76-
| Word repeat `(\w{2,8})+` | 652 ms | 186 ms | 48.3 ms | **3x** | 3.8x slower |
69+
| Inner `.*keyword.*` | 232 ms | 0.30 ms | 0.27 ms | **774x** | 1.1x slower |
70+
| Suffix `.*\.txt` | 236 ms | 1.82 ms | 1.13 ms | **129x** | 1.6x slower |
71+
| Multiline `(?m)^/.*\.php` | 103 ms | 0.50 ms | 0.67 ms | **206x** | **1.3x faster** |
72+
| Email validation | 265 ms | 0.62 ms | 0.27 ms | **428x** | 2.2x slower |
73+
| URL extraction | 353 ms | 0.65 ms | 0.35 ms | **543x** | 1.8x slower |
74+
| IP address | 496 ms | 2.1 ms | 12.1 ms | **231x** | **5.6x faster** |
75+
| Char class `[\w]+` | 581 ms | 51.2 ms | 50.2 ms | **11x** | ~parity |
76+
| Word repeat `(\w{2,8})+` | 712 ms | 186 ms | 48.7 ms | **3x** | 3.8x slower |
7777

7878
**Where coregex excels:**
7979
- Multiline patterns (`(?m)^/.*\.php`) — near Rust parity, 100x+ vs stdlib

literal/extractor.go

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -264,20 +264,17 @@ func (e *Extractor) extractPrefixesAlternate(re *syntax.Regexp, depth int) *Seq
264264
}
265265
}
266266

267-
// If overflow occurred, NOT all alternation branches are represented.
268-
// A partial prefilter would miss matches for unrepresented branches.
269-
// Return empty Seq so no prefilter is built — NFA handles all branches.
270-
// This matches Rust's approach: overflowed literal sets → no prefilter.
271-
if overflowed {
272-
return NewSeq()
273-
}
274-
275267
result := NewSeq(allLits...)
276268

277-
if result.Len() > e.config.MaxLiterals {
278-
// Too many literals but all branches represented: trim to 3-byte
279-
// prefixes, dedup, mark inexact. After trim, all alternation branches
280-
// have at least one prefix in the set (unlike overflow truncation).
269+
if overflowed || result.Len() > e.config.MaxLiterals {
270+
// Either not all branches are represented (overflow) or too many literals.
271+
// Trim to 3-byte prefixes + dedup to fit prefilter capacity.
272+
// Mark ALL as inexact — prefilter is used for skip-ahead only,
273+
// DFA/NFA verifies each candidate (safe with partial coverage).
274+
//
275+
// Rust does the same: optimize_for_prefix_by_preference trims and deduplicates.
276+
// A partial prefilter is much better than no prefilter — DFA with skip-ahead
277+
// vs NFA byte-by-byte on 549 states is 100x+ difference on ARM64.
281278
result.KeepFirstBytes(3)
282279
e.markAllInexact(result)
283280
result.Dedup()

meta/compile.go

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -630,9 +630,13 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
630630
}, nil
631631
}
632632

633-
// adjustForAnchors fixes prefilter and strategy for patterns with anchors.
633+
// adjustForAnchors fixes prefilter for patterns with anchors.
634634
// Anchors (^, $, \b) require verification that Teddy/AC prefilter can't provide.
635-
// Multiline line anchors ((?m)^) need NFA because DFA doesn't verify line positions.
635+
//
636+
// Note: the lazy DFA correctly handles (?m)^ via StartByteMap — after \n it
637+
// selects StartLineLF which includes LookStartLine in the epsilon closure.
638+
// Verified with direct DFA tests and Rust source analysis (identical approach).
639+
// See docs/dev/research/v01216-arm64-regression.md for details.
636640
func adjustForAnchors(pf prefilter.Prefilter, strategy Strategy, re *syntax.Regexp) (prefilter.Prefilter, Strategy) {
637641
if !hasAnchorAssertions(re) {
638642
return pf, strategy
@@ -654,10 +658,6 @@ func adjustForAnchors(pf prefilter.Prefilter, strategy Strategy, re *syntax.Rege
654658
}
655659
}
656660

657-
// DFA can't verify (?m)^ multiline line anchors — use NFA
658-
if strategy == UseDFA && hasMultilineAnchor {
659-
strategy = UseNFA
660-
}
661661
return pf, strategy
662662
}
663663

meta/find_indices.go

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -119,8 +119,11 @@ func (e *Engine) findIndicesNFA(haystack []byte) (int, int, bool) {
119119
state := e.getSearchState()
120120
defer e.putSearchState(state)
121121

122-
// Use prefilter for skip-ahead if available
123-
if e.prefilter != nil {
122+
// Use prefilter candidate loop for skip-ahead — but ONLY when prefilter
123+
// covers all possible match positions (IsComplete or all branches represented).
124+
// Incomplete prefilters (partial case-fold coverage) cannot be used as
125+
// correctness gates — they'd miss branches whose literals were truncated.
126+
if e.prefilter != nil && e.prefilter.IsComplete() {
124127
at := 0
125128
for at < len(haystack) {
126129
// Find next candidate position via prefilter
@@ -172,17 +175,15 @@ func (e *Engine) findIndicesNFAAt(haystack []byte, at int) (int, int, bool) {
172175
state := e.getSearchState()
173176
defer e.putSearchState(state)
174177

175-
// Use prefilter for skip-ahead if available
176-
if e.prefilter != nil {
178+
// Use prefilter candidate loop — only safe with complete prefilter
179+
if e.prefilter != nil && e.prefilter.IsComplete() {
177180
for at < len(haystack) {
178-
// Find next candidate position via prefilter
179181
pos := e.prefilter.Find(haystack, at)
180182
if pos == -1 {
181-
return -1, -1, false // No more candidates
183+
return -1, -1, false
182184
}
183185
atomic.AddUint64(&e.stats.PrefilterHits, 1)
184186

185-
// Try to match at candidate position
186187
var start, end int
187188
var found bool
188189
if useBT && e.boundedBacktracker.CanHandle(len(haystack)-pos) {
@@ -194,14 +195,13 @@ func (e *Engine) findIndicesNFAAt(haystack []byte, at int) (int, int, bool) {
194195
return start, end, true
195196
}
196197

197-
// Move past this position
198198
atomic.AddUint64(&e.stats.PrefilterMisses, 1)
199199
at = pos + 1
200200
}
201201
return -1, -1, false
202202
}
203203

204-
// No prefilter: use BoundedBacktracker if available and safe
204+
// No prefilter or incomplete: use BoundedBacktracker if available and safe
205205
if useBT && e.boundedBacktracker.CanHandle(len(haystack)-at) {
206206
return e.boundedBacktracker.SearchAtWithState(haystack, at, state.backtracker)
207207
}
@@ -1028,17 +1028,16 @@ func (e *Engine) findIndicesNFAAtWithState(haystack []byte, at int, state *Searc
10281028
// BoundedBacktracker can be used for Find operations only when safe
10291029
useBT := e.boundedBacktracker != nil && !e.canMatchEmpty
10301030

1031-
// Use prefilter for skip-ahead if available
1032-
if e.prefilter != nil {
1031+
// Use prefilter candidate loop — only safe with complete prefilter.
1032+
// Incomplete prefilters (partial case-fold coverage) would miss branches.
1033+
if e.prefilter != nil && e.prefilter.IsComplete() {
10331034
for at < len(haystack) {
1034-
// Find next candidate position via prefilter
10351035
pos := e.prefilter.Find(haystack, at)
10361036
if pos == -1 {
1037-
return -1, -1, false // No more candidates
1037+
return -1, -1, false
10381038
}
10391039
atomic.AddUint64(&e.stats.PrefilterHits, 1)
10401040

1041-
// Try to match at candidate position
10421041
var start, end int
10431042
var found bool
10441043
if useBT && e.boundedBacktracker.CanHandle(len(haystack)-pos) {
@@ -1050,14 +1049,13 @@ func (e *Engine) findIndicesNFAAtWithState(haystack []byte, at int, state *Searc
10501049
return start, end, true
10511050
}
10521051

1053-
// Move past this position
10541052
atomic.AddUint64(&e.stats.PrefilterMisses, 1)
10551053
at = pos + 1
10561054
}
10571055
return -1, -1, false
10581056
}
10591057

1060-
// No prefilter: use BoundedBacktracker if available and safe
1058+
// No prefilter or incomplete: use BoundedBacktracker if available and safe
10611059
if useBT && e.boundedBacktracker.CanHandle(len(haystack)-at) {
10621060
return e.boundedBacktracker.SearchAtWithState(haystack, at, state.backtracker)
10631061
}

meta/strategy.go

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,7 +1133,8 @@ type literalAnalysis struct {
11331133
hasGoodLiterals bool // Good prefix literal (LCP >= MinLiteralLen)
11341134
hasTeddyLiterals bool // Suitable for Teddy (2-32 patterns, each >= 3 bytes)
11351135
hasAhoCorasickLiterals bool // Suitable for Aho-Corasick (>32 patterns, each >= 1 byte)
1136-
hasAnchors bool // Pattern has anchors (^, $, \b) that Teddy can't verify
1136+
hasAnchors bool // Pattern has any anchors (^, $, \b)
1137+
hasNonLineAnchors bool // Pattern has anchors other than (?m)^ (\b, $, \A, \z)
11371138
}
11381139

11391140
// selectLiteralStrategy selects strategy based on literal analysis.
@@ -1148,9 +1149,12 @@ func selectLiteralStrategy(literals *literal.Seq, litAnalysis literalAnalysis) S
11481149
// Patterns like "(foo|bar|baz)" where all literals are complete don't need
11491150
// DFA verification - Teddy.Find() returns exact matches.
11501151
// Speedup: 50-250x by skipping all DFA/NFA construction overhead.
1151-
// BUT: patterns with anchors (e.g., (?m)^GET|POST) need DFA to verify
1152-
// that the match position satisfies the anchor constraint.
1153-
if litAnalysis.hasTeddyLiterals && literals.AllComplete() && !litAnalysis.hasAnchors {
1152+
//
1153+
// For (?m)^ multiline anchors: adjustForAnchors() wraps the prefilter with
1154+
// WrapLineAnchor which adds O(1) line-start verification. This makes Teddy
1155+
// safe for (?m)^ patterns — no DFA needed.
1156+
// Only block Teddy for non-line anchors (\b, $, \A, \z) that need DFA verify.
1157+
if litAnalysis.hasTeddyLiterals && literals.AllComplete() && !litAnalysis.hasNonLineAnchors {
11541158
return UseTeddy
11551159
}
11561160

@@ -1418,6 +1422,7 @@ func SelectStrategy(n *nfa.NFA, re *syntax.Regexp, literals *literal.Seq, config
14181422
nfaSize := n.States()
14191423
litAnalysis := analyzeLiterals(literals, config)
14201424
litAnalysis.hasAnchors = hasAnchorAssertions(re)
1425+
litAnalysis.hasNonLineAnchors = litAnalysis.hasAnchors && hasNonLineAnchors(re)
14211426

14221427
// Check for simple char_class+ patterns (HIGHEST priority for character class patterns)
14231428
// Patterns like [\w]+, [a-z]+, \d+ use CharClassSearcher: 14-17x faster than BoundedBacktracker

0 commit comments

Comments
 (0)