Fix binary data corruption in vBinary for non-UTF-8 byte input#1356
Fix binary data corruption in vBinary for non-UTF-8 byte input#1356uwezkhan wants to merge 6 commits into
Conversation
|
Thanks for pointing that out — this PR is the refined follow-up to my earlier attempt. I’ll close #1353 to avoid duplicate review and keep this PR as the active version of the fix. |
|
@angatha @SashankBhamidi @niccokunzmann would y'all please do a technical review? Thank you! |
|
I'd like to learn if you're using assistance of any AI tool since this is interesting, could you let us know? Thanks! |
|
Thankyou for the positive response . I manually research, validate and test the program before raising any PR . also i do use AI for the purpose of wordings to enhance my description of PR. |
|
any update on this PR? |
|
@uwezkhan thanks for your patience and for adding the change log entry per the 7.1.1 release. This PR will require technical review. @SashankBhamidi @angatha or @niccokunzmann would one of you please take a look and let @uwezkhan know? Thank you! |
|
|
||
|
|
||
| def test_param(): | ||
| assert isinstance(vBinary("txt").params, Parameters) |
There was a problem hiding this comment.
Please keep the old test cases in place. They should still be valid or do you want to do a breaking change?
|
|
||
| def test_ical_value(): | ||
| """ical_value property returns the string value.""" | ||
| magic_string = base64.b64encode(b"magic string") |
There was a problem hiding this comment.
Same: Please keep the old test cases in place. They should still be valid or do you want to do a breaking change?
SashankBhamidi
left a comment
There was a problem hiding this comment.
Thanks. The bug is real, confirmed it locally. vBinary(bytes(range(256))) round-trips to 512 bytes on main and 256 on this branch. The jCal output is fixed too, the old 'Hello World!' becomes 'SGVsbG8gV29ybGQh', which matches the RFC 7265 example.
@angatha's concern stands. The PR isn't just refactoring internal storage; it also changes the contract of ical_value, and that isn't called out in the news entry.
ical_valueused to decode the stored value as base64.vBinary("SGVsbG8=").ical_valuereturnedb"Hello"on main, and returnsb"SGVsbG8="here.vBinary("!!!invalid!!!").ical_valueused to raiseValueError. Now it silently returns the bytes.
ical_value was only added in 7.1.0, so the external blast radius is small, but the new behavior is the opposite of what the previous version did. This needs to be deliberate.
The new "always raw bytes" semantics are cleaner than what main had, the old design was internally inconsistent (to_ical treated self.obj as raw data while ical_value treated it as base64). I'd lean toward keeping the new semantics, but:
- The news entry should explicitly mention the
ical_valuechange. - The category should be
breaking, notbugfix. Pinging @niccokunzmann for a sanity check on that.
On the test removals @angatha flagged:
test_paramshould stay as it was.vBinary("txt")still works on this branch, the original assertions still pass.test_ical_value_rejects_non_base64_characterscan go, but the news entry should explain why.
Procedural note: if any part of this was AI-assisted, please follow the Responsible AI use policy. Standard reminder.
|
The underlying bug is tracked in #1289, where @niccokunzmann proposed a specific migration path (introduce |
vBinary currently coerces byte input through Unicode conversion before Base64 serialization, which corrupts arbitrary non-UTF-8 binary payloads.
This change preserves raw byte input in vBinary and serializes it directly to Base64, ensuring binary values round-trip losslessly.
Related image handling and jCal serialization were updated accordingly to maintain compatibility with the new raw-bytes storage.
Regression tests were added to cover non-UTF-8 binary payloads and verify that image handling continues to work as expected.