Skip to content

Commit 807f6ee

Browse files
committed
fix: ReDoS in CSS tokenizer STRING rule
The STRING rule had two ambiguities that caused exponential backtracking on unterminated quoted-string input: 1. The body's negated class `[^\n\r\f"]` matched a literal `\`, overlapping with the {escape} branch. Input like `[foo="\a\a\a...` had 2**N parses for N pairs. 2. {unicode}'s `[0-9A-Fa-f]{1,6}` admitted six match lengths per escape position. Input like `\aaaaaa\aaaaaa...` had 6**N parses. When the closing quote was missing the engine enumerated every parse before failing, so a sub-100-byte payload could hang the process indefinitely. The fix: - Excludes `\` from the body's negated class, so backslashes can only enter via {escape}, removing the cross-branch ambiguity. - Wraps the body alternation in an atomic group `(?>...)*` to lock each iteration's match decision, removing the within-escape length ambiguity. - Adds `\\?{nl}` for CSS line continuation, previously absorbed by the loose negated class. - Drops the `(?<!\\)(?:\\{2})*` bookkeeping that existed only to recover from the original ambiguity. Adds two performance benchmarks asserting linear parse time for both ambiguity classes. ref: GHSA-c4rq-3m3g-8wgx
1 parent e83e991 commit 807f6ee

3 files changed

Lines changed: 44 additions & 5 deletions

File tree

lib/nokogiri/css/tokenizer.rb

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
# frozen_string_literal: true
21
#--
32
# DO NOT MODIFY!!!!
4-
# This file is automatically generated by rex 1.0.7
3+
# This file is automatically generated by rex 1.0.8
54
# from lexical definition file "lib/nokogiri/css/tokenizer.rex".
65
#++
76

@@ -132,7 +131,7 @@ def _next_token
132131
when (text = @ss.scan(/[\s]+/))
133132
action { [:S, text] }
134133

135-
when (text = @ss.scan(/("([^\n\r\f"]|(\n|\r\n|\r|\f)|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*(?<!\\)(?:\\{2})*"|'([^\n\r\f']|(\n|\r\n|\r|\f)|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*(?<!\\)(?:\\{2})*')/))
134+
when (text = @ss.scan(/("(?>[^\n\r\f"\\]|\\?(\n|\r\n|\r|\f)|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*"|'(?>[^\n\r\f'\\]|\\?(\n|\r\n|\r|\f)|[^\0-\177]|(\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f]))*')/))
136135
action { [:STRING, text] }
137136

138137
when (text = @ss.scan(/./))

lib/nokogiri/css/tokenizer.rex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ macro
1616
name {nmstart}{nmchar}*
1717
ident -?{name}
1818
charref {nmchar}+
19-
string1 "([^\n\r\f"]|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*"
20-
string2 '([^\n\r\f']|{nl}|{nonascii}|{escape})*(?<!\\)(?:\\{2})*'
19+
string1 "(?>[^\n\r\f"\\]|\\?{nl}|{nonascii}|{escape})*"
20+
string2 '(?>[^\n\r\f'\\]|\\?{nl}|{nonascii}|{escape})*'
2121
string ({string1}|{string2})
2222
2323
rule

test/css/bench_tokenizer.rb

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# frozen_string_literal: true
2+
3+
require "helper"
4+
require "timeout"
5+
6+
class TestBenchCSSTokenizer < Nokogiri::TestBenchmark
7+
# GHSA-c4rq-3m3g-8wgx: ambiguous regex in the STRING rule backtracks
8+
# exponentially on unterminated `[foo="\a\a\a..."` input. Each sample
9+
# repeats the parse to average out per-call jitter.
10+
describe "css string tokenizer (cross-branch ambiguity)" do
11+
bench_range { bench_linear(10_000, 60_000, 10_000) }
12+
13+
bench_performance_linear("redos in STRING rule", 0.99) do |n|
14+
Timeout.timeout(5) do
15+
payload = %([foo=") + ('\\a' * n) + "x"
16+
50.times do
17+
Nokogiri::CSS.xpath_for(payload)
18+
rescue Nokogiri::CSS::SyntaxError
19+
end
20+
end
21+
end
22+
end
23+
24+
# The unicode escape's `[0-9A-Fa-f]{1,6}` quantifier admits 6 different
25+
# match lengths per escape position, which without an atomic group
26+
# multiplies into 6**N parses on unterminated `[foo="\aaaaaa\aaaaaa..."`.
27+
describe "css string tokenizer (unicode escape length ambiguity)" do
28+
bench_range { bench_linear(2_000, 12_000, 2_000) }
29+
30+
bench_performance_linear("redos in unicode escape length", 0.99) do |n|
31+
Timeout.timeout(5) do
32+
payload = %([foo=") + ('\\aaaaaa' * n) + "x"
33+
150.times do
34+
Nokogiri::CSS.xpath_for(payload)
35+
rescue Nokogiri::CSS::SyntaxError
36+
end
37+
end
38+
end
39+
end
40+
end

0 commit comments

Comments
 (0)