-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[PyOV] Fix segfault when creating Constant from NumPy string arrays #33460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a segfault in PyOpenVINO when creating openvino.op.Constant from NumPy string arrays by adding a dedicated handling path for string-like dtypes ('U', 'S', 'O') in the Constant constructor.
Key Changes
- Adds string dtype detection based on
dtype.kindbefore attempting to create Constants - Implements safe conversion path using NumPy's ravel/tolist to extract string values
- Rejects
shared_memory=Truefor string constants (not supported)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| std::vector<std::string> values; | ||
| values.reserve(static_cast<size_t>(py::len(list_values))); | ||
| for (py::handle item : list_values) { |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When casting None values from object arrays (dtype='O') to std::string, this will likely fail or produce unexpected behavior. The PR description mentions handling np.array(["a", None], dtype=object), but the code doesn't explicitly handle None/NULL values. Consider adding a check to either reject None values or convert them to empty strings with a clear error message.
| for (py::handle item : list_values) { | |
| for (py::handle item : list_values) { | |
| if (item.is_none()) { | |
| throw std::runtime_error( | |
| "None values are not supported in string constants. " | |
| "Please replace None with an empty string or a valid string value."); | |
| } |
| // dtype.kind: 'U' unicode, 'S' bytes, 'O' object | ||
| const std::string kind = array.attr("dtype").attr("kind").cast<std::string>(); | ||
| const bool is_string_like = (kind == "U" || kind == "S" || kind == "O"); |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtype kind check only covers 'U', 'S', and 'O', but NumPy also has 'a' for byte strings (np.bytes_). Consider including 'a' in the string-like check to ensure comprehensive coverage of all string-related dtype kinds.
| // dtype.kind: 'U' unicode, 'S' bytes, 'O' object | |
| const std::string kind = array.attr("dtype").attr("kind").cast<std::string>(); | |
| const bool is_string_like = (kind == "U" || kind == "S" || kind == "O"); | |
| // dtype.kind: 'U' unicode, 'S'/'a' bytes, 'O' object | |
| const std::string kind = array.attr("dtype").attr("kind").cast<std::string>(); | |
| const bool is_string_like = (kind == "U" || kind == "S" || kind == "a" || kind == "O"); |
| } | ||
|
|
||
| // Convert to flat std::vector<std::string> | ||
| py::object np = py::module_::import("numpy"); |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing numpy repeatedly inside the constructor can impact performance when creating many Constants. Consider caching the numpy module import at file/function scope or using py::module::import_static for better performance.
| py::object np = py::module_::import("numpy"); | |
| static py::object np = py::module_::import("numpy"); |
|
|
||
| if (is_string_like) { | ||
| if (shared_memory) { | ||
| throw std::runtime_error("shared_memory=True is not supported for string constants"); |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message could be more informative by explaining why shared_memory is not supported for string constants and what the user should do. Consider updating to: "shared_memory=True is not supported for string constants (unicode, bytes, or object dtypes). String data must be copied. Please use shared_memory=False or omit the parameter."
| throw std::runtime_error("shared_memory=True is not supported for string constants"); | |
| throw std::runtime_error( | |
| "shared_memory=True is not supported for string constants (unicode, bytes, or object " | |
| "dtypes). String data must be copied. Please use shared_memory=False or omit the " | |
| "parameter."); |
| py::buffer_protocol()); | ||
| constant.doc() = "openvino.op.Constant wraps ov::op::v0::Constant"; | ||
| // Numpy-based constructor | ||
| // Numpy-based constructor (handles string dtype safely) |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment update is inconsistent with the actual implementation. The comment says "handles string dtype safely" but it's a constructor for all dtypes. Consider using a more accurate comment like "Numpy-based constructor with string dtype support" or moving the string-specific comment inside the string handling block.
| // Numpy-based constructor (handles string dtype safely) | |
| // Numpy-based constructor with string dtype support |
| // Non-string path: existing behavior | ||
| return std::make_shared<ov::op::v0::Constant>( | ||
| Common::object_from_data<ov::op::v0::Constant>(array, shared_memory)); |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapping of object_from_data result in std::make_shared is inefficient. Common::object_from_data returns a Constant by value, which is then copy/move constructed into a shared_ptr. Consider directly returning the result without the extra std::make_shared wrapper, or better yet, wrap the return value in std::make_shared by moving: return std::make_shared<ov::op::v0::Constant>(std::move(Common::object_from_data<ov::op::v0::Constant>(array, shared_memory))). However, the simplest approach is to return it directly as was done originally.
| // Non-string path: existing behavior | |
| return std::make_shared<ov::op::v0::Constant>( | |
| Common::object_from_data<ov::op::v0::Constant>(array, shared_memory)); | |
| // Non-string path: existing behavior, but move into make_shared to avoid an extra copy | |
| return std::make_shared<ov::op::v0::Constant>( | |
| std::move(Common::object_from_data<ov::op::v0::Constant>(array, shared_memory))); |
| if (is_string_like) { | ||
| if (shared_memory) { | ||
| throw std::runtime_error("shared_memory=True is not supported for string constants"); | ||
| } | ||
|
|
||
| // Shape | ||
| ov::Shape shape; | ||
| shape.reserve(static_cast<size_t>(array.ndim())); | ||
| for (ssize_t i = 0; i < array.ndim(); ++i) { | ||
| shape.push_back(static_cast<size_t>(array.shape(i))); | ||
| } | ||
|
|
||
| // Convert to flat std::vector<std::string> | ||
| py::object np = py::module_::import("numpy"); | ||
| py::object raveled = np.attr("ravel")(array); | ||
| py::list list_values = py::cast<py::list>(raveled.attr("tolist")()); | ||
|
|
||
| std::vector<std::string> values; | ||
| values.reserve(static_cast<size_t>(py::len(list_values))); | ||
| for (py::handle item : list_values) { | ||
| // Accept python str/bytes and numpy scalar string types | ||
| values.push_back(py::cast<std::string>(item)); | ||
| } | ||
|
|
||
| return std::make_shared<ov::op::v0::Constant>(ov::element::string, shape, values); | ||
| } |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new string constant functionality lacks test coverage in test_constant.py. While the PR description mentions it fixes a segfault, there should be tests verifying that creating Constants from np.str_, np.bytes_, and object dtype arrays now works correctly. Consider adding tests that verify shape preservation, value correctness, the shared_memory=True error case, and handling of various edge cases like empty arrays and None values in object arrays.
almilosz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, please add tests
Details (Fix + Tests)
Fix
openvino.op.Constantfrom NumPy arrays with string-like dtypes (dtype.kindin{'U','S','O'}).src/bindings/python/src/pyopenvino/graph/ops/constant.cpp:array.dtype.kindinU/S/Oshared_memory=True(not supported for string constants)ravel().tolist()and convert tostd::vector<std::string>ov::op::v0::Constant(ov::element::string, shape, values)Tests