Skip to content

Fix binary data corruption in vBinary for non-UTF-8 byte input#1356

Open
uwezkhan wants to merge 6 commits into
collective:mainfrom
uwezkhan:fix/vbinary-raw-bytes
Open

Fix binary data corruption in vBinary for non-UTF-8 byte input#1356
uwezkhan wants to merge 6 commits into
collective:mainfrom
uwezkhan:fix/vbinary-raw-bytes

Conversation

@uwezkhan
Copy link
Copy Markdown
Contributor

@uwezkhan uwezkhan commented May 3, 2026

vBinary currently coerces byte input through Unicode conversion before Base64 serialization, which corrupts arbitrary non-UTF-8 binary payloads.

This change preserves raw byte input in vBinary and serializes it directly to Base64, ensuring binary values round-trip losslessly.

Related image handling and jCal serialization were updated accordingly to maintain compatibility with the new raw-bytes storage.

Regression tests were added to cover non-UTF-8 binary payloads and verify that image handling continues to work as expected.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 3, 2026

@uwezkhan
Copy link
Copy Markdown
Contributor Author

uwezkhan commented May 3, 2026

Thanks for pointing that out — this PR is the refined follow-up to my earlier attempt.

I’ll close #1353 to avoid duplicate review and keep this PR as the active version of the fix.

@stevepiercy
Copy link
Copy Markdown
Member

@angatha @SashankBhamidi @niccokunzmann would y'all please do a technical review? Thank you!

@SashankBhamidi
Copy link
Copy Markdown
Member

I'd like to learn if you're using assistance of any AI tool since this is interesting, could you let us know? Thanks!

@uwezkhan
Copy link
Copy Markdown
Contributor Author

uwezkhan commented May 4, 2026

Thankyou for the positive response .

I manually research, validate and test the program before raising any PR . also i do use AI for the purpose of wordings to enhance my description of PR.

@uwezkhan
Copy link
Copy Markdown
Contributor Author

any update on this PR?

@stevepiercy
Copy link
Copy Markdown
Member

@uwezkhan thanks for your patience and for adding the change log entry per the 7.1.1 release.

This PR will require technical review. @SashankBhamidi @angatha or @niccokunzmann would one of you please take a look and let @uwezkhan know? Thank you!



def test_param():
assert isinstance(vBinary("txt").params, Parameters)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the old test cases in place. They should still be valid or do you want to do a breaking change?


def test_ical_value():
"""ical_value property returns the string value."""
magic_string = base64.b64encode(b"magic string")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: Please keep the old test cases in place. They should still be valid or do you want to do a breaking change?

Copy link
Copy Markdown
Member

@SashankBhamidi SashankBhamidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The bug is real, confirmed it locally. vBinary(bytes(range(256))) round-trips to 512 bytes on main and 256 on this branch. The jCal output is fixed too, the old 'Hello World!' becomes 'SGVsbG8gV29ybGQh', which matches the RFC 7265 example.

@angatha's concern stands. The PR isn't just refactoring internal storage; it also changes the contract of ical_value, and that isn't called out in the news entry.

  • ical_value used to decode the stored value as base64. vBinary("SGVsbG8=").ical_value returned b"Hello" on main, and returns b"SGVsbG8=" here.
  • vBinary("!!!invalid!!!").ical_value used to raise ValueError. Now it silently returns the bytes.

ical_value was only added in 7.1.0, so the external blast radius is small, but the new behavior is the opposite of what the previous version did. This needs to be deliberate.

The new "always raw bytes" semantics are cleaner than what main had, the old design was internally inconsistent (to_ical treated self.obj as raw data while ical_value treated it as base64). I'd lean toward keeping the new semantics, but:

  • The news entry should explicitly mention the ical_value change.
  • The category should be breaking, not bugfix. Pinging @niccokunzmann for a sanity check on that.

On the test removals @angatha flagged:

  • test_param should stay as it was. vBinary("txt") still works on this branch, the original assertions still pass.
  • test_ical_value_rejects_non_base64_characters can go, but the news entry should explain why.

Procedural note: if any part of this was AI-assisted, please follow the Responsible AI use policy. Standard reminder.

@SashankBhamidi
Copy link
Copy Markdown
Member

The underlying bug is tracked in #1289, where @niccokunzmann proposed a specific migration path (introduce vBinary.bytes, keep vBinary.obj as a property returning string data, optionally deprecate). This PR takes a different approach by changing .obj to bytes directly. Worth aligning with that thread before continuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants