Heap Out-of-Bounds Read in IR String Constant Deserialization#34489
Heap Out-of-Bounds Read in IR String Constant Deserialization#34489bumbosiepsak wants to merge 4 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a heap out-of-bounds read vulnerability in OpenVINO IR packed string-constant deserialization by adding stronger validation of the packed buffer header and per-string offsets before constructing std::string instances.
Changes:
- Validate
num_stringsis non-negative and ensure the computed header fits within the provided buffer. - Validate per-string begin/end offsets (non-negative, monotonic, and within the data region) before reading.
- Use validated
size_toffsets when constructing unpackedstd::stringvalues.
|
@bumbosiepsak |
01760f4 to
8c65373
Compare
| // Calculate header size: [num_strings][0][offset1...offsetN] | ||
| const size_t num_elements = static_cast<size_t>(num_strings); | ||
| // Check addition overflow for header element count (2 + num_elements) | ||
| OPENVINO_ASSERT(num_elements <= std::numeric_limits<size_t>::max() - 2, |
There was a problem hiding this comment.
Is it possible if non-negative i32 is converted to the max value should be 1/2 or 1/4 of size_t
There was a problem hiding this comment.
I'm not sure if I understand 😊 Could you elaborate a little, please?
| // first run over all elements: calculate total memory required to hold all strings | ||
| header_size = sizeof(int32_t) * (1 + 1 + num_elements); | ||
| // Check addition overflow for header element count (1 + 1 + num_elements) | ||
| OPENVINO_ASSERT(num_elements <= std::numeric_limits<size_t>::max() - 2, |
There was a problem hiding this comment.
I removed this check in favour of checking, that size_t is at least as big, as int32_t. (I hope we don't need to support 16-bit platforms...)
Then the positive range of size_t always sufficient then (twice as big, as int32_t).
There was a problem hiding this comment.
I think 16-bit platforms are not a case for OV
There was a problem hiding this comment.
Test can put into string_align_buffer_test.cpp file
There was a problem hiding this comment.
Actually this test file grew a little. Is it OK if I keep it separate?
I'll move content to string_align_buffer_test.cpp if you say so ;)
There was a problem hiding this comment.
Is better keep all of them in same file.
| OPENVINO_ASSERT(int32_t(size) >= 4 + 4 + 4 * num_strings, | ||
| "Incorrect packed string tensor format: the packed string tensor must contain first " | ||
| "string offset and end indices"); | ||
| OPENVINO_ASSERT(num_strings >= 0, "Incorrect packed string tensor format: negative number of strings"); |
There was a problem hiding this comment.
is required to check?
if is negative it will be large number maybe updated check std::numeric_limits<size_t>::max() can detect it also?
There was a problem hiding this comment.
I sticked to the convention already established in this function, that the header consists of signed integers.
Question: can we imply, that they are unsigned (and switch to treating them as such everywhere)?
- Vulnerability fixed
- Additional overflow protection added - Variable usage streamlined with type unification
e36bd58 to
adeba47
Compare
| header_element_t* header = reinterpret_cast<header_element_t*>(data.get()); | ||
| header[0] = static_cast<header_element_t>(strings_count); | ||
|
|
||
| if (strings_count > 0) { |
There was a problem hiding this comment.
Question: what is the wanted buffer shape in case of zero strings provided?
Do we still want to create and fill the tail (i.e. begin and end of a string, that doesn't exist)?
There was a problem hiding this comment.
If Constant has shape [2] and her string count is zero the constant will be not correct as shape and number of data not match.
It looks like some kind of error case
adeba47 to
2f32e86
Compare
| data = std::shared_ptr<uint8_t>(new uint8_t[header_size], std::default_delete<uint8_t[]>()); | ||
|
|
||
| header_element_t* header = reinterpret_cast<header_element_t*>(data.get()); | ||
| header[0] = static_cast<header_element_t>(strings_count); |
There was a problem hiding this comment.
Both strings_count (from get_num_elements()) and current_string_end are size_t values cast to int32_t via static_cast without overflow validation. If either value exceeds INT32_MAX, this is signed integer overflow — undefined behavior in C++. Since this is a security-focused PR, the packing side should also be hardened:
OPENVINO_ASSERT(strings_count <= static_cast<size_t>(std::numeric_limits<header_element_t>::max()),
"Too many strings to pack: strings_count exceeds int32_t range");
Similarly, current_string_end should be validated before each cast:
OPENVINO_ASSERT(current_string_end <= static_cast<size_t>(std::numeric_limits<header_element_t>::max()),
"Cumulative string size exceeds int32_t range");
- Reworked according to review remarks - Naming cleaned - Checks made tighter
2f32e86 to
52cd61e
Compare
| using header_element_t = int32_t; // Type of a single element in the header (strings_count and offsets) | ||
|
|
||
| static_assert(sizeof(header_element_t) <= sizeof(size_t), | ||
| "Header element type must be able to represent offsets and number of strings as size_t"); |
|
|
||
| /// @brief Test case for string offset exceeding buffer bounds in packed string tensor. | ||
| /// Expecting AssertFailure with message about string offset exceeds buffer bounds. | ||
| /// num_strings = 1, header: [1, 0, end0=10], but buffer too small |
| "Incorrect packed string tensor format: negative string offset in the packed string tensor"); | ||
|
|
||
| OPENVINO_ASSERT(begin_signed <= end_signed, | ||
| "Incorrect packed string tensor format: begin offset greater than end offset"); |
There was a problem hiding this comment.
"Incorrect packed string tensor format: begin offset greater than end offset");
begin offset greater than end offset this give no information as condition is serialized
Optional
Is always keep as number and message of assert as small as possible (but explain error) to not add which is not is usually used.
|
|
||
| } // namespace | ||
|
|
||
| namespace ov { |
There was a problem hiding this comment.
| namespace ov { | |
| namespace ov::test { |
| using header_element_t = int32_t; // Type of a single element in the header (strings_count and offsets) | ||
|
|
||
| namespace { |
| /// Expecting AssertFailure with message about begin offset greater than end offset. | ||
| /// num_strings = 2, header: [2, 0, end0=-3, end1=5] | ||
| TEST(StringUnpackTensorTest, NegativeOffsetsFails) { | ||
| using testing::HasSubstr; |
There was a problem hiding this comment.
It can be used once in this file
| /// @brief Test case for missing number of strings in packed string tensor. | ||
| /// Expecting AssertFailure with message about missing strings count. | ||
| /// num_strings = <missing> |
There was a problem hiding this comment.
These comment are not required
| // first run over all elements: calculate total memory required to hold all strings | ||
| header_size = sizeof(int32_t) * (1 + 1 + num_elements); | ||
| // Check addition overflow for header element count (1 + 1 + num_elements) | ||
| OPENVINO_ASSERT(num_elements <= std::numeric_limits<size_t>::max() - 2, |
There was a problem hiding this comment.
I think 16-bit platforms are not a case for OV
aux_unpack_string_tensor()Tickets: