Fix str_errors parameter regression in C extension (#255) #271

topher200 · 2026-01-03T16:39:31Z

Summary

Fixes #255 - Restores support for the str_errors parameter in the C extension when decoding CBOR strings containing invalid UTF-8 byte sequences. The parameter has been broken since v5.6.0.

Problem Description

Since version 5.6.0, the str_errors parameter in cbor2.loads() stopped working in the C extension (_cbor2). When decoding CBOR data containing invalid UTF-8 sequences:

Expected behavior (v5.5.1 and earlier):

cbor2.loads(b"cfo\x90", str_errors="replace")  # Returns: 'fo�'

Broken behavior (v5.6.0 - v5.8.0):

cbor2.loads(b"cfo\x90", str_errors="replace")  # Raises: CBORDecodeValueError

The pure Python implementation (cbor2._decoder) continued to work correctly; only the C extension was affected.

Root Cause

Commit 387755e (PR #204) - "Fixed MemoryError when decoding large definite strings" Date: Jan 14, 2024

This commit rewrote the string decoding logic in source/decoder.c to handle large strings in chunks, fixing memory issues. However, it accidentally removed the str_errors parameter from UTF-8 decoding calls:

Before (v5.5.1 - working):

ret = PyUnicode_DecodeUTF8(buf, length, PyBytes_AS_STRING(self->str_errors));

After (v5.6.0 - broken):

// Short strings (≤65536 bytes)
PyObject *ret = PyUnicode_FromStringAndSize(bytes, length);  // No error handler support

// Long strings (>65536 bytes)
string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed);  // NULL instead of str_errors

Changes

Re-add the missing parameter.

Fixes #255

Test Plan

Add a new unit test for this functionality.

Checklist

If this is a user-facing code change, like a bugfix or a new feature, please ensure that
you've fulfilled the following conditions (where applicable):

You've added tests (in tests/) which would fail without your patch
You've updated the documentation (in docs/), in case of behavior changes or new
features
You've added a new changelog entry (in docs/versionhistory.rst).

agronholm · 2026-01-03T16:41:22Z

This is addressed by an earlier open PR, #270, yes?

coveralls · 2026-01-03T16:42:27Z

coverage: 94.58%. remained the same
when pulling ab091c7 on memfault:error-handling-fix
into b480757 on agronholm:master.

## Summary Fixes agronholm#255 - Restores support for the `str_errors` parameter in the C extension when decoding CBOR strings containing invalid UTF-8 byte sequences. The parameter has been broken since v5.6.0. ## Problem Description Since version 5.6.0, the `str_errors` parameter in `cbor2.loads()` stopped working in the C extension (`_cbor2`). When decoding CBOR data containing invalid UTF-8 sequences: **Expected behavior (v5.5.1 and earlier):** ```python cbor2.loads(b"cfo\x90", str_errors="replace") # Returns: 'fo�' ``` **Broken behavior (v5.6.0 - v5.8.0):** ```python cbor2.loads(b"cfo\x90", str_errors="replace") # Raises: CBORDecodeValueError ``` The pure Python implementation (`cbor2._decoder`) continued to work correctly; only the C extension was affected. ## Root Cause **Commit 387755e (PR agronholm#204) - "Fixed MemoryError when decoding large definite strings"** Date: Jan 14, 2024 This commit rewrote the string decoding logic in `source/decoder.c` to handle large strings in chunks, fixing memory issues. However, it accidentally removed the `str_errors` parameter from UTF-8 decoding calls: **Before (v5.5.1 - working):** ```c ret = PyUnicode_DecodeUTF8(buf, length, PyBytes_AS_STRING(self->str_errors)); ``` **After (v5.6.0 - broken):** ```c // Short strings (≤65536 bytes) PyObject *ret = PyUnicode_FromStringAndSize(bytes, length); // No error handler support // Long strings (>65536 bytes) string = PyUnicode_DecodeUTF8Stateful(source_buffer, chunk_length, NULL, &consumed); // NULL instead of str_errors ``` ### Solution Re-add the missing parameter. ### Test Plan Add a new unit test for this functionality.

topher200 · 2026-01-03T17:13:47Z

What a fun coincidence! I started on this PR last week, funny to see you fix it yesterday.

Yes, it looks like your PR is a superset of mine. I confirmed that if I bring the test from this PR into the impl from your PR, it passes. I'll close this PR.

topher200 force-pushed the error-handling-fix branch from 25cbc6c to ab091c7 Compare January 3, 2026 16:46

topher200 closed this Jan 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix str_errors parameter regression in C extension (#255) #271

Fix str_errors parameter regression in C extension (#255) #271

Uh oh!

topher200 commented Jan 3, 2026

Uh oh!

agronholm commented Jan 3, 2026

Uh oh!

coveralls commented Jan 3, 2026 •

edited

Loading

Uh oh!

topher200 commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix str_errors parameter regression in C extension (#255) #271

Fix str_errors parameter regression in C extension (#255) #271

Uh oh!

Conversation

topher200 commented Jan 3, 2026

Summary

Problem Description

Root Cause

Changes

Test Plan

Checklist

Uh oh!

agronholm commented Jan 3, 2026

Uh oh!

coveralls commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topher200 commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Jan 3, 2026 •

edited

Loading