fix: backtracking in CSS tokenizer rules by flavorjones · Pull Request #3626 · sparklemotion/nokogiri

flavorjones · 2026-04-27T15:02:17Z

What problem is this PR intended to solve?

Addresses three regular expression backtracking / redos issues.

ref: GHSA-c4rq-3m3g-8wgx

Have you included adequate test coverage?

Yes, added benchmark suite tests to assert linearity of regex performance.

Does this change affect the behavior of either the C or the Java implementations?

N/A

The STRING rule had two ambiguities that caused exponential backtracking on unterminated quoted-string input: 1. The body's negated class `[^\n\r\f"]` matched a literal `\`, overlapping with the {escape} branch. Input like `[foo="\a\a\a...` had 2**N parses for N pairs. 2. {unicode}'s `[0-9A-Fa-f]{1,6}` admitted six match lengths per escape position. Input like `\aaaaaa\aaaaaa...` had 6**N parses. When the closing quote was missing the engine enumerated every parse before failing, so a sub-100-byte payload could hang the process indefinitely. The fix: - Excludes `\` from the body's negated class, so backslashes can only enter via {escape}, removing the cross-branch ambiguity. - Wraps the body alternation in an atomic group `(?>...)*` to lock each iteration's match decision, removing the within-escape length ambiguity. - Adds `\\?{nl}` for CSS line continuation, previously absorbed by the loose negated class. - Drops the `(?<!\\)(?:\\{2})*` bookkeeping that existed only to recover from the original ambiguity. Adds two performance benchmarks asserting linear parse time for both ambiguity classes. ref: GHSA-c4rq-3m3g-8wgx

A second instance of the same backtracking pattern: `{unicode}`'s `[0-9A-Fa-f]{1,6}` admits six match lengths per escape position, and {nmchar} appears under `*` in {name}. When the `{ident}\({w}` rule fails (no `(` after an identifier-shaped prefix), the engine backtracks through `{nmchar}*` for 6**N parses. Payload `\aaaaaa\aaaaaa...X` triggers it: at n=8 it takes 330ms, at n=10 it takes 11.4s. Wrap the body alternations of {nmchar} and {nmstart} in atomic groups, mirroring the prior STRING-rule fix. Each nmchar/nmstart match is locked once committed, so the outer `{nmchar}*` can release whole iterations but cannot try alternative inner consumption of the {1,6} hex run. Add a benchmark test asserting linear time, similar to previous. ref: GHSA-c4rq-3m3g-8wgx

JRuby's JIT warmup variance makes per-call timings too noisy for the R**2 >= 0.99 linearity assertion. Observed CI failures with R**2 around 0.94-0.97 even though the regex itself is unchanged between engines. The ReDoS property is determined by the regex, not the engine (Joni and Onigmo implement the same matching semantics), so MRI coverage is sufficient evidence the fixes hold.

**What problem is this PR intended to solve?** Backport of #3626 to v1.19.x

flavorjones force-pushed the regex-backtracking-redos branch from d9c535d to 82f06ea Compare April 27, 2026 15:04

flavorjones added 2 commits April 27, 2026 11:05

flavorjones force-pushed the regex-backtracking-redos branch from 82f06ea to 9bada21 Compare April 27, 2026 15:05

flavorjones added topic/security topic/css labels Apr 27, 2026

flavorjones mentioned this pull request Apr 27, 2026

fix: backtracking in CSS tokenizer rules (v1.19.x backport) #3627

Merged

flavorjones merged commit 52feb61 into main Apr 27, 2026
169 checks passed

flavorjones deleted the regex-backtracking-redos branch April 27, 2026 17:35

flavorjones added a commit that referenced this pull request Apr 27, 2026

fix: backtracking in CSS tokenizer rules (v1.19.x backport) (#3627)

7501a63

**What problem is this PR intended to solve?** Backport of #3626 to v1.19.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: backtracking in CSS tokenizer rules#3626

fix: backtracking in CSS tokenizer rules#3626
flavorjones merged 3 commits intomainfrom
regex-backtracking-redos

flavorjones commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

flavorjones commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant