-
Notifications
You must be signed in to change notification settings - Fork 238
Description
Environment
Computer: MacBook Pro M1 Pro
macOS version: 26.2 (Tahoe)
Python version: 3.11
LiteRT version: 2.1.2
This issue has also been observed on a x86_64 Ubuntu 24 CUDA environment and is likely present on other platforms using the Python API.
Issue
The Python TensorBuffer.read() method is very slow, likely due to un-optimized type conversion in the Python wrapper.
A call to this method triggers a call to the BuildPyListFromFloat() C function (or other similar ones).
LiteRT/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer_wrapper.cc
Lines 301 to 334 in 1a2d726
| PyObject* TensorBufferWrapper::ReadTensor(PyObject* buffer_capsule, | |
| int num_elements, | |
| const std::string& dtype) { | |
| if (!PyCapsule_CheckExact(buffer_capsule)) { | |
| return ReportError("ReadTensor: invalid capsule"); | |
| } | |
| void* ptr = PyCapsule_GetPointer( | |
| buffer_capsule, litert_wrapper_utils::kLiteRtTensorBufferName.data()); | |
| if (!ptr) { | |
| return ReportError("ReadTensor: null pointer in capsule"); | |
| } | |
| TensorBuffer tb = TensorBuffer::WrapCObject( | |
| static_cast<LiteRtTensorBuffer>(ptr), OwnHandle::kNo); | |
| if (dtype == "float32") { | |
| std::vector data(num_elements, 0.f); | |
| if (auto status = tb.Read<float>(absl::MakeSpan(data)); !status) | |
| return ConvertErrorToPyExc(status.Error()); | |
| return BuildPyListFromFloat(data); | |
| } | |
| if (dtype == "int32") { | |
| std::vector data(num_elements, 0); | |
| if (auto status = tb.Read<int32_t>(absl::MakeSpan(data)); !status) | |
| return ConvertErrorToPyExc(status.Error()); | |
| return BuildPyListFromInt32(data); | |
| } | |
| if (dtype == "int8") { | |
| std::vector<int8_t> data(num_elements, 0); | |
| if (auto status = tb.Read<int8_t>(absl::MakeSpan(data)); !status) | |
| return ConvertErrorToPyExc(status.Error()); | |
| return BuildPyListFromInt8(data); | |
| } | |
| return ReportError("ReadTensor: unsupported dtype '" + dtype + "'"); | |
| } |
This C function is likely the bottleneck: it sequentially converts each element to a Python float and insert it into a Python list.
LiteRT/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer_wrapper.cc
Lines 121 to 127 in 1a2d726
| PyObject* BuildPyListFromFloat(absl::Span<const float> data) { | |
| PyObject* py_list = PyList_New(data.size()); | |
| for (size_t i = 0; i < data.size(); i++) { | |
| PyList_SetItem(py_list, i, PyFloat_FromDouble(data[i])); | |
| } | |
| return py_list; | |
| } |
Finally, the Python list is converted to a NumPy array in the Python wrapper., making the previous conversion unnecessary.
LiteRT/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer.py
Lines 118 to 142 in 1a2d726
| def read(self, num_elements: int, output_dtype): | |
| """Reads data from this tensor buffer. | |
| Args: | |
| num_elements: Number of elements to read. | |
| output_dtype: NumPy dtype for the output (e.g., np.float32, np.int8). | |
| Returns: | |
| A NumPy array containing the tensor data. | |
| Example: | |
| # Get output as NumPy array | |
| output_array = tensor_buffer.read(4, np.float32).reshape((1, 4)) | |
| Raises: | |
| ValueError: If output_dtype is not a NumPy dtype or is not supported. | |
| """ | |
| if not isinstance(output_dtype, type) or not hasattr( | |
| np, output_dtype.__name__ | |
| ): | |
| raise ValueError(f"output_dtype must be a NumPy dtype (e.g., np.float32)") | |
| dtype_str = self._dtype_to_str(output_dtype) | |
| data_list = _tb.ReadTensor(self._capsule, num_elements, dtype_str) | |
| return np.array(data_list, dtype=output_dtype) |
This issue makes reading data from a TensorBuffer in Python orders of magnitude slower than inference on some models. Some reading operations took more than 100ms even for small tensors (around 20 MB).