fix: backtracking in CSS tokenizer rules (v1.19.x backport)#3627
Merged
flavorjones merged 3 commits intov1.19.xfrom Apr 27, 2026
Merged
fix: backtracking in CSS tokenizer rules (v1.19.x backport)#3627flavorjones merged 3 commits intov1.19.xfrom
flavorjones merged 3 commits intov1.19.xfrom
Conversation
The STRING rule had two ambiguities that caused exponential
backtracking on unterminated quoted-string input:
1. The body's negated class `[^\n\r\f"]` matched a literal `\`,
overlapping with the {escape} branch. Input like
`[foo="\a\a\a...` had 2**N parses for N pairs.
2. {unicode}'s `[0-9A-Fa-f]{1,6}` admitted six match lengths
per escape position. Input like `\aaaaaa\aaaaaa...` had
6**N parses.
When the closing quote was missing the engine enumerated every
parse before failing, so a sub-100-byte payload could hang the
process indefinitely.
The fix:
- Excludes `\` from the body's negated class, so backslashes
can only enter via {escape}, removing the cross-branch
ambiguity.
- Wraps the body alternation in an atomic group `(?>...)*` to
lock each iteration's match decision, removing the
within-escape length ambiguity.
- Adds `\\?{nl}` for CSS line continuation, previously absorbed
by the loose negated class.
- Drops the `(?<!\\)(?:\\{2})*` bookkeeping that existed only
to recover from the original ambiguity.
Adds two performance benchmarks asserting linear parse time
for both ambiguity classes.
ref: GHSA-c4rq-3m3g-8wgx
(cherry picked from commit 807f6ee)
A second instance of the same backtracking pattern: `{unicode}`'s
`[0-9A-Fa-f]{1,6}` admits six match lengths per escape position,
and {nmchar} appears under `*` in {name}. When the `{ident}\({w}`
rule fails (no `(` after an identifier-shaped prefix), the engine
backtracks through `{nmchar}*` for 6**N parses. Payload
`\aaaaaa\aaaaaa...X` triggers it: at n=8 it takes 330ms, at n=10
it takes 11.4s.
Wrap the body alternations of {nmchar} and {nmstart} in atomic
groups, mirroring the prior STRING-rule fix. Each nmchar/nmstart
match is locked once committed, so the outer `{nmchar}*` can
release whole iterations but cannot try alternative inner
consumption of the {1,6} hex run.
Add a benchmark test asserting linear time, similar to previous.
ref: GHSA-c4rq-3m3g-8wgx
(cherry picked from commit 9bada21)
JRuby's JIT warmup variance makes per-call timings too noisy for the R**2 >= 0.99 linearity assertion. Observed CI failures with R**2 around 0.94-0.97 even though the regex itself is unchanged between engines. The ReDoS property is determined by the regex, not the engine (Joni and Onigmo implement the same matching semantics), so MRI coverage is sufficient evidence the fixes hold. (cherry picked from commit 760bde0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem is this PR intended to solve?
Backport of #3626 to v1.19.x