Python `TensorBuffer.read()` is very slow

## Environment

Computer: MacBook Pro M1 Pro
macOS version: 26.2 (Tahoe)
Python version: 3.11
LiteRT version: 2.1.2

This issue has also been observed on a x86_64 Ubuntu 24 CUDA environment and is likely present on other platforms using the Python API.

## Issue

The Python `TensorBuffer.read()` method is very slow, likely due to un-optimized type conversion in the Python wrapper.

A call to this method triggers a call to the `BuildPyListFromFloat()` C function (or other similar ones).

https://github.com/google-ai-edge/LiteRT/blob/1a2d726b0aa571712a1908f52485f1176fce7b7c/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer_wrapper.cc#L301-L334

This C function is likely the bottleneck: it sequentially converts each element to a Python float and insert it into a Python list.

https://github.com/google-ai-edge/LiteRT/blob/1a2d726b0aa571712a1908f52485f1176fce7b7c/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer_wrapper.cc#L121-L127

Finally, the Python list is converted to a NumPy array in the Python wrapper., making the previous conversion unnecessary.

https://github.com/google-ai-edge/LiteRT/blob/1a2d726b0aa571712a1908f52485f1176fce7b7c/litert/python/litert_wrapper/tensor_buffer_wrapper/tensor_buffer.py#L118-L142

This issue makes reading data from a TensorBuffer in Python orders of magnitude slower than inference on some models. Some reading operations took more than 100ms even for small tensors (around 20 MB).

	PyObject* TensorBufferWrapper::ReadTensor(PyObject* buffer_capsule,
	int num_elements,
	const std::string& dtype) {
	if (!PyCapsule_CheckExact(buffer_capsule)) {
	return ReportError("ReadTensor: invalid capsule");
	}
	void* ptr = PyCapsule_GetPointer(
	buffer_capsule, litert_wrapper_utils::kLiteRtTensorBufferName.data());
	if (!ptr) {
	return ReportError("ReadTensor: null pointer in capsule");
	}
	TensorBuffer tb = TensorBuffer::WrapCObject(
	static_cast<LiteRtTensorBuffer>(ptr), OwnHandle::kNo);

	if (dtype == "float32") {
	std::vector data(num_elements, 0.f);
	if (auto status = tb.Read<float>(absl::MakeSpan(data)); !status)
	return ConvertErrorToPyExc(status.Error());
	return BuildPyListFromFloat(data);
	}
	if (dtype == "int32") {
	std::vector data(num_elements, 0);
	if (auto status = tb.Read<int32_t>(absl::MakeSpan(data)); !status)
	return ConvertErrorToPyExc(status.Error());
	return BuildPyListFromInt32(data);
	}
	if (dtype == "int8") {
	std::vector<int8_t> data(num_elements, 0);
	if (auto status = tb.Read<int8_t>(absl::MakeSpan(data)); !status)
	return ConvertErrorToPyExc(status.Error());
	return BuildPyListFromInt8(data);
	}
	return ReportError("ReadTensor: unsupported dtype '" + dtype + "'");
	}

	PyObject* BuildPyListFromFloat(absl::Span<const float> data) {
	PyObject* py_list = PyList_New(data.size());
	for (size_t i = 0; i < data.size(); i++) {
	PyList_SetItem(py_list, i, PyFloat_FromDouble(data[i]));
	}
	return py_list;
	}

	def read(self, num_elements: int, output_dtype):
	"""Reads data from this tensor buffer.

	Args:
	num_elements: Number of elements to read.
	output_dtype: NumPy dtype for the output (e.g., np.float32, np.int8).

	Returns:
	A NumPy array containing the tensor data.

	Example:
	# Get output as NumPy array
	output_array = tensor_buffer.read(4, np.float32).reshape((1, 4))

	Raises:
	ValueError: If output_dtype is not a NumPy dtype or is not supported.
	"""
	if not isinstance(output_dtype, type) or not hasattr(
	np, output_dtype.__name__
	):
	raise ValueError(f"output_dtype must be a NumPy dtype (e.g., np.float32)")

	dtype_str = self._dtype_to_str(output_dtype)
	data_list = _tb.ReadTensor(self._capsule, num_elements, dtype_str)
	return np.array(data_list, dtype=output_dtype)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python `TensorBuffer.read()` is very slow #5755

Environment

Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python TensorBuffer.read() is very slow #5755

Description

Environment

Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Python `TensorBuffer.read()` is very slow #5755