Commit 921d193
authored
perf: flat DFA + integrated prefilter — 35% faster than baseline (#151)
* fix: NFA candidate loop guard — use partialCoverage instead of IsComplete
IsComplete() guard blocked prefilter candidate loop for ALL incomplete
prefilters, including prefix-only ones where all alternation branches are
represented. This caused 22x regression on Kostya's errors pattern
(1984ms vs 90ms on v0.12.14).
Root cause: Rust integrates prefilter as skip-ahead INSIDE PikeVM
(pikevm.rs:1293-1299), not as external correctness gate. When NFA states
are empty, prefilter skips ahead. Partial coverage is safe because NFA
continues scanning if prefilter misses.
Fix: Added partialCoverage flag on literal.Seq (set only on overflow
truncation). NFA candidate loop uses !partialCoverage guard instead of
IsComplete(). DFA paths retain IsComplete() where needed.
errors: 1984ms -> 109ms. Stdlib compat: 38/38 PASS.
* perf: PikeVM integrated prefilter skip-ahead (Rust approach)
Integrate prefilter inside PikeVM search loop as skip-ahead (pikevm.rs:1293).
When NFA has no active threads, PikeVM jumps to next candidate via
prefilter.Find() instead of byte-by-byte scan.
Safe for partial-coverage prefilters — NFA processes all branches from
each candidate position. This is architecturally cleaner than external
candidate loop guards (partialCoverage flag still used for external BT
candidate loop as BoundedBacktracker has no integrated skip-ahead).
Also includes PR #150 changes: partialCoverage flag on literal.Seq,
NFA candidate loop guard uses partialCoverage instead of IsComplete().
errors pattern: 1984ms -> 120ms. la_suspicious: 38/38 stdlib PASS.
* perf: flat DFA transition table — eliminate pointer chase in hot loop
Replace double indirection (stateList[id].transitions[class]) with flat
transition table (flatTrans[sid*stride + class]) in searchFirstAt hot loop.
Also replace State.IsMatch() with compact matchFlags[sid] bool slice.
Fast path now works with state ID only — no *State pointer needed.
State struct accessed only in slow path (determinize, word boundary).
Inspired by Rust regex-automata hybrid/dfa.rs Cache.trans flat layout.
Kostya benchmark: 3.60s -> 2.56s (1.4x faster).
bots pattern restored to v0.12.14 baseline (278ms vs 287ms).
Stdlib compat: 38/38 PASS.
* perf: 4x loop unrolling in searchFirstAt (Rust approach)
Unroll DFA hot loop 4x — process 4 bytes per iteration when all
transitions are in flat table (no unknown/dead states). Falls to
single-byte slow path on any special state.
Marginal improvement on x86 with SIMD prefilters (branch predictor
handles single-byte well). May help more on ARM64 where branch
prediction is less aggressive.
Reference: Rust hybrid/search.rs:195-221.
Stdlib compat: 38/38 PASS.
* perf: apply flat DFA transition table to ALL search functions
Extend flat table optimization from searchFirstAt to all 6 DFA search
functions: searchAt, searchEarliestMatch, searchEarliestMatchAnchored,
SearchReverse, SearchReverseLimited, IsMatchReverse.
Hot loop pattern: ft[int(sid)*stride + classIdx] replaces
stateList[id].transitions[class] — eliminates pointer chase.
State struct accessed only in slow path (determinize, word boundary).
Kostya benchmark: 2.56s -> 2.28s (+12%).
errors pattern: 109ms -> 81ms (better than v0.12.14 baseline 90ms).
Stdlib compat: 38/38 PASS.
* fix: restore DFA prefilter skip-ahead for incomplete prefilters
IsComplete() guard in findIndicesDFA/findIndicesDFAAt blocked prefilter
skip-ahead for incomplete prefilters (memmem, Teddy with prefix-only
literals). But DFA verifies full pattern at candidate — skip is always safe.
This was the root cause of sessions (229ms -> 36ms), api_calls (245ms ->
95ms), post_requests (259ms -> 114ms) regressions.
Kostya benchmark total: 2.28s -> 1.62s (FASTER than v0.12.14 baseline 1.80s!).
Stdlib compat: 38/38 PASS.
* perf: DFA prefilter skip-ahead at start state (Rust approach)
When DFA returns to start state with no match in progress, use prefilter
to skip ahead to next candidate instead of byte-by-byte scanning.
Applied to searchFirstAt and searchAt (bidirectional DFA path).
This is the Rust approach (hybrid/search.rs:232-258): prefilter is called
inside the DFA loop when a start state is detected, not externally.
peak_hours: 197ms -> 90ms (2.2x faster, gap vs Rust: 9x -> 4x).
Kostya total: 1.62s -> 1.38s (15% faster).
Stdlib compat: 38/38 PASS.
* docs: update CHANGELOG for v0.12.18
* perf: flat DFA transition table in SearchAtAnchored
Apply flat table to SearchAtAnchored — called for every prefilter
candidate verification in bidirectional DFA path. Eliminates pointer
chase in the most frequent DFA hot path.
Kostya benchmark: 1.38s -> 1.17s (15% faster).
Total improvement vs v0.12.14: 1.80s -> 1.17s (35% faster).
Stdlib compat: 38/38 PASS.
* perf: flat DFA transition table in isMatchWithPrefilter and findWithPrefilterAt
Apply flat table to last 2 remaining functions with old Transition() calls.
No more State pointer chase in ANY DFA hot loop.
Kostya benchmark: 1.17s -> 1.19s (stable, tokens 116ms->51ms).
All DFA search functions now use flatTrans[sid*stride+class].
Stdlib compat: 38/38 PASS.
* docs: update ROADMAP and CHANGELOG for v0.12.18
* fix: guard getState/IsMatchState against 386 int overflow
On 386, int(StateID(0xFFFFFFFF)) = -1 (int is 32-bit).
getState and IsMatchState used int(id) for slice indexing,
causing panic: index out of range [-1].
Fix: check sid >= DeadState before int cast.
DeadState (0xFFFFFFFE) and InvalidState (0xFFFFFFFF) are
sentinel values not present in stateList/matchFlags.
* fix: use safeOffset for all flat table indexing — 386 int overflow
On 386, int is 32-bit. int(StateID(0xFFFFFFFE)) = -2, causing
negative slice index panic in flat table lookups.
Added safeOffset() helper using uint arithmetic (always positive).
Replaced all 23 occurrences of int(sid)*stride in hot loops.
safeOffset inlines — zero overhead on 64-bit.
* fix: safeOffset guard for DeadState/InvalidState on 386
uint multiply overflows on 386: uint(0xFFFFFFFE)*uint(20) wraps around.
Guard with sid >= DeadState check — returns MaxInt so bounds check fails
safely. Normal state IDs (small values) take fast path without branch.
* docs: update README benchmark table and ROADMAP for v0.12.181 parent bc78fa7 commit 921d193
11 files changed
Lines changed: 921 additions & 436 deletions
File tree
- dfa/lazy
- literal
- meta
- nfa
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
15 | 45 | | |
16 | 46 | | |
17 | 47 | | |
| |||
39 | 69 | | |
40 | 70 | | |
41 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
42 | 80 | | |
43 | 81 | | |
44 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
91 | 97 | | |
92 | 98 | | |
93 | 99 | | |
| |||
130 | 136 | | |
131 | 137 | | |
132 | 138 | | |
133 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
134 | 143 | | |
135 | 144 | | |
136 | 145 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | | - | |
35 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
38 | 59 | | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | 60 | | |
43 | 61 | | |
44 | 62 | | |
45 | 63 | | |
46 | 64 | | |
47 | 65 | | |
48 | | - | |
49 | 66 | | |
50 | 67 | | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
| 68 | + | |
55 | 69 | | |
56 | 70 | | |
57 | | - | |
58 | | - | |
59 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
60 | 74 | | |
61 | 75 | | |
62 | 76 | | |
| |||
95 | 109 | | |
96 | 110 | | |
97 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
98 | 129 | | |
99 | 130 | | |
100 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
101 | 173 | | |
102 | 174 | | |
103 | 175 | | |
| |||
220 | 292 | | |
221 | 293 | | |
222 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
223 | 300 | | |
224 | 301 | | |
225 | 302 | | |
| |||
0 commit comments