-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Things to check first
-
I have searched the existing issues and didn't find my bug already reported there
-
I have checked that my bug is still present in the latest release
cbor2 version
5.6.6.dev1
Python version
3.12
What happened?
When failing to decode certain corrupted types of data there is a memory leak in the C extension. When running the pure python implementation with CBOR2_BUILD_C_EXTENSION=0 this does not happen. The corrupted type I was able to reproduce was failing to decode a string inside of a nested map. It fails here in the pure python implementation for reference.
if length <= 65536:
try:
result = self.read(length).decode("utf-8", self._str_errors)
except UnicodeDecodeError as exc:
raise CBORDecodeValueError("error decoding unicode string") from exc
This CBORDecodeValueError path in the C extension would likely be the culprit where it exits without cleanup. Possibly a missing Py_DECREF in the failure path.
How can we reproduce the bug?
cbor2_repro.py
import cbor2
# malformed string at the leaf of nested maps
bad_cbor = (
b'\xbf' + # indefinite map
b'\x01' + # key = 1
b'\xa1' + # value = map of 1 pair
b'\x02' + # key = 2
b'\x63foo'[:-1] + # string "foo", but truncated → invalid UTF-8
b'\xff' # break
)
for _ in range(10_000_000):
try:
cbor2.loads(bad_cbor)
except Exception:
passMemory usage will steadily grow under mprof. Below are memory usage for the pure python implementation showing no memory usage increase over time, but continuous increase in usage under the c extensions version.
mprof run python cbor2_repro.py

