N4_schema_pattern_binary_poc.zip
Description
I found a reproducible crash in the JSON schema pattern validation path on the current master branch.
When validating a binary string parsed from CSEXP or MsgPack bin, ucl_schema_validate_string() passes ucl_object_tostring(obj) directly to regexec(). These binary string values are length-prefixed and are not guaranteed to be NUL-terminated, while regexec() expects a NUL-terminated C string. This can make regexec() read beyond the allocated object value and crash.
Version
Tested on current master:
ed8617c565083c81939cfb99f1594af1cb539850
Environment
- OS: Linux x86_64
- Compiler: clang with AddressSanitizer and UndefinedBehaviorSanitizer
- Sanitizers: AddressSanitizer + UndefinedBehaviorSanitizer
Reproduction
Schema:
{"items":{"pattern":"^z+$"}}
This crashes when the document is an array containing a binary string parsed from CSEXP or MsgPack bin.
The attached PoC zip contains:
N4_schema.txt: the schema above.
N4_doc_csexp_min.bin: a minimized CSEXP document.
N4_doc_msgpack_bin_min.bin: a minimized MsgPack bin document.
verify_regex2_csexp.c: standalone CSEXP reproducer.
verify_regex_msgpack.c: standalone MsgPack bin reproducer.
Example build and run for the CSEXP reproducer:
cd libucl
clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \
-I ./include -I ./src -I ./uthash -I ./klib \
verify_regex2_csexp.c build-fuzz/libucl.a -lm -lrt -o verify_regex2_csexp
ASAN_OPTIONS=detect_leaks=0 ./verify_regex2_csexp
Example build and run for the MsgPack bin reproducer:
cd libucl
clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \
-I ./include -I ./src -I ./uthash -I ./klib \
verify_regex_msgpack.c build-fuzz/libucl.a -lm -lrt -o verify_regex_msgpack
ASAN_OPTIONS=detect_leaks=0 ./verify_regex_msgpack
ASAN output
ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000
The signal is caused by a READ memory access.
#0 libc
#1 regexec
#2 __interceptor_regexec
#3 ucl_schema_validate_string src/ucl_schema.c:414
#4 ucl_schema_validate src/ucl_schema.c:1063
#5 ucl_schema_validate_array src/ucl_schema.c:517
#6 ucl_schema_validate src/ucl_schema.c:1056
#7 ucl_schema_validate src/ucl_schema.c:955
#8 ucl_object_validate_root_ext src/ucl_schema.c:1099
#9 ucl_object_validate src/ucl_schema.c:1075
The reproducer prints that the validated element is a binary string:
elem type=4 len=5 binary=1
Root cause hypothesis
The pattern branch in ucl_schema_validate_string() calls:
if (regexec(&re, ucl_object_tostring(obj), 0, NULL, 0) != 0) {
...
}
regexec() expects a NUL-terminated string. However, binary string values produced by CSEXP or MsgPack bin are stored as length-prefixed values and are not guaranteed to have a trailing NUL byte. Passing them directly to regexec() can therefore read outside the allocated value buffer.
Notes
This is a possible out-of-bounds read / denial-of-service issue in schema validation. One possible fix would be to copy the string value into a temporary len + 1 buffer and append a NUL byte before calling regexec(), or to reject/skip pattern validation for UCL_OBJECT_BINARY strings.
N4_schema_pattern_binary_poc.zip
Description
I found a reproducible crash in the JSON schema
patternvalidation path on the currentmasterbranch.When validating a binary string parsed from CSEXP or MsgPack
bin,ucl_schema_validate_string()passesucl_object_tostring(obj)directly toregexec(). These binary string values are length-prefixed and are not guaranteed to be NUL-terminated, whileregexec()expects a NUL-terminated C string. This can makeregexec()read beyond the allocated object value and crash.Version
Tested on current
master:Environment
Reproduction
Schema:
{"items":{"pattern":"^z+$"}}This crashes when the document is an array containing a binary string parsed from CSEXP or MsgPack
bin.The attached PoC zip contains:
N4_schema.txt: the schema above.N4_doc_csexp_min.bin: a minimized CSEXP document.N4_doc_msgpack_bin_min.bin: a minimized MsgPackbindocument.verify_regex2_csexp.c: standalone CSEXP reproducer.verify_regex_msgpack.c: standalone MsgPackbinreproducer.Example build and run for the CSEXP reproducer:
cd libucl clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \ -I ./include -I ./src -I ./uthash -I ./klib \ verify_regex2_csexp.c build-fuzz/libucl.a -lm -lrt -o verify_regex2_csexp ASAN_OPTIONS=detect_leaks=0 ./verify_regex2_csexpExample build and run for the MsgPack
binreproducer:cd libucl clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \ -I ./include -I ./src -I ./uthash -I ./klib \ verify_regex_msgpack.c build-fuzz/libucl.a -lm -lrt -o verify_regex_msgpack ASAN_OPTIONS=detect_leaks=0 ./verify_regex_msgpackASAN output
The reproducer prints that the validated element is a binary string:
Root cause hypothesis
The
patternbranch inucl_schema_validate_string()calls:regexec()expects a NUL-terminated string. However, binary string values produced by CSEXP or MsgPackbinare stored as length-prefixed values and are not guaranteed to have a trailing NUL byte. Passing them directly toregexec()can therefore read outside the allocated value buffer.Notes
This is a possible out-of-bounds read / denial-of-service issue in schema validation. One possible fix would be to copy the string value into a temporary
len + 1buffer and append a NUL byte before callingregexec(), or to reject/skippatternvalidation forUCL_OBJECT_BINARYstrings.