Bound non-null-terminated value scanners against end of input#2636
Merged
Conversation
Several JSON value scanners dereference *it without an end check, relying on the trailing '\0' sentinel that only exists for null-terminated buffers. On a non-null-terminated buffer (opts.null_terminated = false) there is no sentinel, so truncated or boundary input reads one or more bytes past the end (heap-buffer-overflow under ASAN). Each fix leaves the null-terminated fast path byte-for-byte unchanged and bounds only the non-null-terminated path: - skip_number (non-validating): gate the digit scan on it < end when not null-terminated (skip_number_opts gains a null_terminated field). - skip_number_with_validation: guard each standalone *it read with it != end (the find_if_not scans were already end-bounded). - number_of_array_elements: bound the element pre-scan loop on it == end. - skip_string (non-padded, validating): bound the scan loop and the post-backslash read (skip_string_opts gains a null_terminated field). - NDJSON read_new_lines: bound the inter-record newline scan on it != end. Adds a non_null_terminated_scanner_bounds suite exercising each scanner at its buffer boundary; these run under the ASAN CI job.
…end of input Two more value scanners share the bug class fixed in the previous commit: they dereference *it without an end check, relying on the trailing '\0' sentinel that only exists for null-terminated buffers. On a non-null-terminated buffer (opts.null_terminated = false) truncated input reads past the end of the buffer (heap-buffer-overflow under ASAN). - jmespath handle_slice (both the tuple and resizable-array overloads): the structural ']' / ',' scans now route through a local at_end() guard that compiles out entirely when null_terminated. Covers the partial-read path, the read-all fallback (negative step / negative indices), and the skip-to-slice-start pre-scan. - enum-by-name reader (GLZ_REFLECTION26 reflect_enums path): the key scan had its operands reversed (*it dereferenced before the it != end bound); swap them and guard the post-scan ++it against an unterminated key. Tests: - non_null_terminated_slice_bounds (tests/jmespath): feeds every truncation prefix of a complete array through an exact-size buffer (tuple and vector targets, partial-read and read-all paths). Runs under the ASAN CI job; each pre-fix over-read was reproduced at jmespath.hpp and is resolved. - reflect_enums non-null-terminated bounds (tests/p2996_test): reads an unterminated reflective enum key over an exact-size buffer and asserts an error. With the unbounded scan a valid name lacking its closing quote over-reads and wrongly succeeds, so this pins the bounded behavior; it runs in the dedicated C++26 reflection CI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A class of JSON value scanners dereference
*itwithout anendcheck, relying on the trailing'\0'sentinel that only exists for null-terminated buffers. Whenopts.null_terminated = false(no sentinel), truncated or boundary input reads past the end of the buffer — a heap-buffer-overflow under ASAN on an exact-size buffer.This is the same bug class as the recent minified
skip_wsend-guard work, but in scanners that never route throughskip_ws, so they were not covered. Each fix preserves null-terminated behavior — the gated guards compile out entirely, and the few loop restructures are behavior-equivalent — and bounds only the non-null-terminated path.Fixes
skip_number(non-validating)util/parse.hppglz::raw_jsonfield / value skipskip_number_with_validationutil/parse.hppget_view_json/validate_jsonnumbersnumber_of_array_elementsjson/read.hppemplace_backarrays (e.g.std::forward_list)skip_string(non-padded, validating)util/parse.hppget_view_json/validate_jsonstringsread_new_linesjson/ndjson.hppread_ndjsonwithnull_terminated = falsehandle_slice(JMESPath array slices)json/jmespath.hppread_jmespathslices withnull_terminated = falsejson/read.hppreflect_enumsenum values (C++26 P2996) withnull_terminated = falseskip_number/skip_string: gated viaif constexpr (not Opts.null_terminated)(their opts structs gain anull_terminatedfield); zero overhead on the default path.skip_number_with_validation: unconditionalit != endguards on the standalone*itreads (thefind_if_notscans were alreadyend-bounded).number_of_array_elements:if constexpr (not Opts.null_terminated)end-checks before each dereference; compiles out by default.read_new_lines: gated; the null-terminated path is unchanged.handle_slice(both the tuple and resizable-array overloads): a localat_end()guard —if constexpr (not Opts.null_terminated), so it compiles out by default — runs before each structural]/,scan. Covers the partial-read path, the read-all fallback (negative step / negative indices), and the skip-to-slice-start pre-scan.*itdereferenced before theit != endbound); swapped, plus a guard on the post-scan++itfor an unterminated key. Gated behindGLZ_REFLECTION26+reflect_enums.Scope note
This hardens the
null_terminated = falsecontract. Reading a non-null-terminated buffer under default options (null_terminated = true, e.g. astd::string_view/std::spanover an exact-size region with no terminator) remains unsupported, as it is for the rest of the reader — the value parsers rely on the sentinel there.Tests
non_null_terminated_scanner_bounds(tests/json_test): exercises each scanner above at its buffer boundary (truncated and complete inputs over exact-size buffers).non_null_terminated_slice_bounds(tests/jmespath): feeds every truncation prefix of a complete array through an exact-size buffer (tuple and vector targets, partial-read and read-all paths); the complete array still resolves.reflect_enums non-null-terminated bounds(tests/p2996_test): reads an unterminated reflective enum key over an exact-size buffer and asserts an error. With the unbounded scan, a valid name lacking its closing quote over-reads and wrongly succeeds, so the error assertion pins the bounded behavior even where that CI job runs without sanitizers; it runs in the dedicated C++26 reflection CI.The JSON suites run under the existing ASAN CI job, which is what catches the over-reads. Verified locally:
json_testpasses under ASAN (685 tests, 0 failures), and the jmespath suite passes under ASAN (18 tests, 136 asserts); each pre-fix over-read was reproduced and is resolved.