Skip to content

SEGV in schema pattern validation on non-NUL-terminated binary strings #390

@lbw15507

Description

@lbw15507

N4_schema_pattern_binary_poc.zip

Description

I found a reproducible crash in the JSON schema pattern validation path on the current master branch.

When validating a binary string parsed from CSEXP or MsgPack bin, ucl_schema_validate_string() passes ucl_object_tostring(obj) directly to regexec(). These binary string values are length-prefixed and are not guaranteed to be NUL-terminated, while regexec() expects a NUL-terminated C string. This can make regexec() read beyond the allocated object value and crash.

Version

Tested on current master:

ed8617c565083c81939cfb99f1594af1cb539850

Environment

  • OS: Linux x86_64
  • Compiler: clang with AddressSanitizer and UndefinedBehaviorSanitizer
  • Sanitizers: AddressSanitizer + UndefinedBehaviorSanitizer

Reproduction

Schema:

{"items":{"pattern":"^z+$"}}

This crashes when the document is an array containing a binary string parsed from CSEXP or MsgPack bin.

The attached PoC zip contains:

  • N4_schema.txt: the schema above.
  • N4_doc_csexp_min.bin: a minimized CSEXP document.
  • N4_doc_msgpack_bin_min.bin: a minimized MsgPack bin document.
  • verify_regex2_csexp.c: standalone CSEXP reproducer.
  • verify_regex_msgpack.c: standalone MsgPack bin reproducer.

Example build and run for the CSEXP reproducer:

cd libucl
clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \
  -I ./include -I ./src -I ./uthash -I ./klib \
  verify_regex2_csexp.c build-fuzz/libucl.a -lm -lrt -o verify_regex2_csexp

ASAN_OPTIONS=detect_leaks=0 ./verify_regex2_csexp

Example build and run for the MsgPack bin reproducer:

cd libucl
clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O1 \
  -I ./include -I ./src -I ./uthash -I ./klib \
  verify_regex_msgpack.c build-fuzz/libucl.a -lm -lrt -o verify_regex_msgpack

ASAN_OPTIONS=detect_leaks=0 ./verify_regex_msgpack

ASAN output

ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000
The signal is caused by a READ memory access.

#0 libc
#1 regexec
#2 __interceptor_regexec
#3 ucl_schema_validate_string src/ucl_schema.c:414
#4 ucl_schema_validate src/ucl_schema.c:1063
#5 ucl_schema_validate_array src/ucl_schema.c:517
#6 ucl_schema_validate src/ucl_schema.c:1056
#7 ucl_schema_validate src/ucl_schema.c:955
#8 ucl_object_validate_root_ext src/ucl_schema.c:1099
#9 ucl_object_validate src/ucl_schema.c:1075

The reproducer prints that the validated element is a binary string:

elem type=4 len=5 binary=1

Root cause hypothesis

The pattern branch in ucl_schema_validate_string() calls:

if (regexec(&re, ucl_object_tostring(obj), 0, NULL, 0) != 0) {
    ...
}

regexec() expects a NUL-terminated string. However, binary string values produced by CSEXP or MsgPack bin are stored as length-prefixed values and are not guaranteed to have a trailing NUL byte. Passing them directly to regexec() can therefore read outside the allocated value buffer.

Notes

This is a possible out-of-bounds read / denial-of-service issue in schema validation. One possible fix would be to copy the string value into a temporary len + 1 buffer and append a NUL byte before calling regexec(), or to reject/skip pattern validation for UCL_OBJECT_BINARY strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions