Skip to content

Conversation

@topher200
Copy link

Summary

Fixes #255 - Restores support for the str_errors parameter in the C extension when decoding CBOR strings containing invalid UTF-8 byte sequences. The parameter has been broken since v5.6.0.

Problem Description

Since version 5.6.0, the str_errors parameter in cbor2.loads() stopped working in the C extension (_cbor2). When decoding CBOR data containing invalid UTF-8 sequences:

Expected behavior (v5.5.1 and earlier):

cbor2.loads(b"cfo\x90", str_errors="replace")  # Returns: 'fo�'

Broken behavior (v5.6.0 - v5.8.0):

cbor2.loads(b"cfo\x90", str_errors="replace")  # Raises: CBORDecodeValueError

The pure Python implementation (cbor2._decoder) continued to work correctly; only the C extension was affected.

Root Cause

Commit 387755e (PR #204) - "Fixed MemoryError when decoding large definite strings" Date: Jan 14, 2024

This commit rewrote the string decoding logic in source/decoder.c to handle large strings in chunks, fixing memory issues. However, it accidentally removed the str_errors parameter from UTF-8 decoding calls:

Before (v5.5.1 - working):

ret = PyUnicode_DecodeUTF8(buf, length, PyBytes_AS_STRING(self->str_errors));

After (v5.6.0 - broken):

// Short strings (≤65536 bytes)
PyObject *ret = PyUnicode_FromStringAndSize(bytes, length);  // No error handler support

// Long strings (>65536 bytes)
string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed);  // NULL instead of str_errors

Changes

Re-add the missing parameter.

Fixes #255

Test Plan

Add a new unit test for this functionality.

Checklist

If this is a user-facing code change, like a bugfix or a new feature, please ensure that
you've fulfilled the following conditions (where applicable):

  • You've added tests (in tests/) which would fail without your patch
  • You've updated the documentation (in docs/), in case of behavior changes or new
    features
  • You've added a new changelog entry (in docs/versionhistory.rst).

@agronholm
Copy link
Owner

This is addressed by an earlier open PR, #270, yes?

@coveralls
Copy link

coveralls commented Jan 3, 2026

Coverage Status

coverage: 94.58%. remained the same
when pulling ab091c7 on memfault:error-handling-fix
into b480757 on agronholm:master.

## Summary

Fixes agronholm#255 - Restores support for the `str_errors` parameter in the C
extension when decoding CBOR strings containing invalid UTF-8 byte
sequences. The parameter has been broken since v5.6.0.

## Problem Description

Since version 5.6.0, the `str_errors` parameter in `cbor2.loads()`
stopped working in the C extension (`_cbor2`). When decoding CBOR data
containing invalid UTF-8 sequences:

**Expected behavior (v5.5.1 and earlier):**
```python
cbor2.loads(b"cfo\x90", str_errors="replace")  # Returns: 'fo�'
```

**Broken behavior (v5.6.0 - v5.8.0):**
```python
cbor2.loads(b"cfo\x90", str_errors="replace")  # Raises: CBORDecodeValueError
```

The pure Python implementation (`cbor2._decoder`) continued to work
correctly; only the C extension was affected.

## Root Cause

**Commit 387755e (PR agronholm#204) - "Fixed MemoryError when decoding large definite strings"**
Date: Jan 14, 2024

This commit rewrote the string decoding logic in `source/decoder.c` to
handle large strings in chunks, fixing memory issues. However, it
accidentally removed the `str_errors` parameter from UTF-8 decoding
calls:

**Before (v5.5.1 - working):**
```c
ret = PyUnicode_DecodeUTF8(buf, length, PyBytes_AS_STRING(self->str_errors));
```

**After (v5.6.0 - broken):**
```c
// Short strings (≤65536 bytes)
PyObject *ret = PyUnicode_FromStringAndSize(bytes, length);  // No error handler support

// Long strings (>65536 bytes)
string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed);  // NULL instead of str_errors
```

### Solution

Re-add the missing parameter.

 ### Test Plan

Add a new unit test for this functionality.
@topher200
Copy link
Author

What a fun coincidence! I started on this PR last week, funny to see you fix it yesterday.

Yes, it looks like your PR is a superset of mine. I confirmed that if I bring the test from this PR into the impl from your PR, it passes. I'll close this PR.

@topher200 topher200 closed this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

loads() str_errors="replace" kwarg no longer worker since 5.6.0

3 participants