fix: backtracking in CSS tokenizer rules#3626
Merged
flavorjones merged 3 commits intomainfrom Apr 27, 2026
Merged
Conversation
d9c535d to
82f06ea
Compare
The STRING rule had two ambiguities that caused exponential
backtracking on unterminated quoted-string input:
1. The body's negated class `[^\n\r\f"]` matched a literal `\`,
overlapping with the {escape} branch. Input like
`[foo="\a\a\a...` had 2**N parses for N pairs.
2. {unicode}'s `[0-9A-Fa-f]{1,6}` admitted six match lengths
per escape position. Input like `\aaaaaa\aaaaaa...` had
6**N parses.
When the closing quote was missing the engine enumerated every
parse before failing, so a sub-100-byte payload could hang the
process indefinitely.
The fix:
- Excludes `\` from the body's negated class, so backslashes
can only enter via {escape}, removing the cross-branch
ambiguity.
- Wraps the body alternation in an atomic group `(?>...)*` to
lock each iteration's match decision, removing the
within-escape length ambiguity.
- Adds `\\?{nl}` for CSS line continuation, previously absorbed
by the loose negated class.
- Drops the `(?<!\\)(?:\\{2})*` bookkeeping that existed only
to recover from the original ambiguity.
Adds two performance benchmarks asserting linear parse time
for both ambiguity classes.
ref: GHSA-c4rq-3m3g-8wgx
A second instance of the same backtracking pattern: `{unicode}`'s
`[0-9A-Fa-f]{1,6}` admits six match lengths per escape position,
and {nmchar} appears under `*` in {name}. When the `{ident}\({w}`
rule fails (no `(` after an identifier-shaped prefix), the engine
backtracks through `{nmchar}*` for 6**N parses. Payload
`\aaaaaa\aaaaaa...X` triggers it: at n=8 it takes 330ms, at n=10
it takes 11.4s.
Wrap the body alternations of {nmchar} and {nmstart} in atomic
groups, mirroring the prior STRING-rule fix. Each nmchar/nmstart
match is locked once committed, so the outer `{nmchar}*` can
release whole iterations but cannot try alternative inner
consumption of the {1,6} hex run.
Add a benchmark test asserting linear time, similar to previous.
ref: GHSA-c4rq-3m3g-8wgx
82f06ea to
9bada21
Compare
JRuby's JIT warmup variance makes per-call timings too noisy for the R**2 >= 0.99 linearity assertion. Observed CI failures with R**2 around 0.94-0.97 even though the regex itself is unchanged between engines. The ReDoS property is determined by the regex, not the engine (Joni and Onigmo implement the same matching semantics), so MRI coverage is sufficient evidence the fixes hold.
flavorjones
added a commit
that referenced
this pull request
Apr 27, 2026
**What problem is this PR intended to solve?** Backport of #3626 to v1.19.x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem is this PR intended to solve?
Addresses three regular expression backtracking / redos issues.
ref: GHSA-c4rq-3m3g-8wgx
Have you included adequate test coverage?
Yes, added benchmark suite tests to assert linearity of regex performance.
Does this change affect the behavior of either the C or the Java implementations?
N/A