Skip to content

fix: backtracking in CSS tokenizer rules (v1.19.x backport)#3627

Merged
flavorjones merged 3 commits intov1.19.xfrom
regex-backtracking-redos_v1.19.x
Apr 27, 2026
Merged

fix: backtracking in CSS tokenizer rules (v1.19.x backport)#3627
flavorjones merged 3 commits intov1.19.xfrom
regex-backtracking-redos_v1.19.x

Conversation

@flavorjones
Copy link
Copy Markdown
Member

What problem is this PR intended to solve?

Backport of #3626 to v1.19.x

The STRING rule had two ambiguities that caused exponential
backtracking on unterminated quoted-string input:

1. The body's negated class `[^\n\r\f"]` matched a literal `\`,
   overlapping with the {escape} branch. Input like
   `[foo="\a\a\a...` had 2**N parses for N pairs.

2. {unicode}'s `[0-9A-Fa-f]{1,6}` admitted six match lengths
   per escape position. Input like `\aaaaaa\aaaaaa...` had
   6**N parses.

When the closing quote was missing the engine enumerated every
parse before failing, so a sub-100-byte payload could hang the
process indefinitely.

The fix:

- Excludes `\` from the body's negated class, so backslashes
  can only enter via {escape}, removing the cross-branch
  ambiguity.
- Wraps the body alternation in an atomic group `(?>...)*` to
  lock each iteration's match decision, removing the
  within-escape length ambiguity.
- Adds `\\?{nl}` for CSS line continuation, previously absorbed
  by the loose negated class.
- Drops the `(?<!\\)(?:\\{2})*` bookkeeping that existed only
  to recover from the original ambiguity.

Adds two performance benchmarks asserting linear parse time
for both ambiguity classes.

ref: GHSA-c4rq-3m3g-8wgx
(cherry picked from commit 807f6ee)
A second instance of the same backtracking pattern: `{unicode}`'s
`[0-9A-Fa-f]{1,6}` admits six match lengths per escape position,
and {nmchar} appears under `*` in {name}. When the `{ident}\({w}`
rule fails (no `(` after an identifier-shaped prefix), the engine
backtracks through `{nmchar}*` for 6**N parses. Payload
`\aaaaaa\aaaaaa...X` triggers it: at n=8 it takes 330ms, at n=10
it takes 11.4s.

Wrap the body alternations of {nmchar} and {nmstart} in atomic
groups, mirroring the prior STRING-rule fix. Each nmchar/nmstart
match is locked once committed, so the outer `{nmchar}*` can
release whole iterations but cannot try alternative inner
consumption of the {1,6} hex run.

Add a benchmark test asserting linear time, similar to previous.

ref: GHSA-c4rq-3m3g-8wgx
(cherry picked from commit 9bada21)
JRuby's JIT warmup variance makes per-call timings too noisy for
the R**2 >= 0.99 linearity assertion. Observed CI failures with
R**2 around 0.94-0.97 even though the regex itself is unchanged
between engines.

The ReDoS property is determined by the regex, not the engine
(Joni and Onigmo implement the same matching semantics), so MRI
coverage is sufficient evidence the fixes hold.

(cherry picked from commit 760bde0)
@flavorjones flavorjones added topic/security topic/css backport Backport of a PR to the current release branch labels Apr 27, 2026
@flavorjones flavorjones merged commit 7501a63 into v1.19.x Apr 27, 2026
162 checks passed
@flavorjones flavorjones deleted the regex-backtracking-redos_v1.19.x branch April 27, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Backport of a PR to the current release branch topic/css topic/security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant