Skip to content

Remove a potential source of UB from potential_utf#7871

Merged
robertbastian merged 1 commit intounicode-org:mainfrom
GiGainfosystems:fix/potential_utf_ub
Apr 15, 2026
Merged

Remove a potential source of UB from potential_utf#7871
robertbastian merged 1 commit intounicode-org:mainfrom
GiGainfosystems:fix/potential_utf_ub

Conversation

@weiznich
Copy link
Copy Markdown
Contributor

@weiznich weiznich commented Apr 14, 2026

This commit removes a transmute from PotentialUtf8::from_boxed_bytes as this might rely on undefined behaviour.

It is only sound to transmute between two types like that if their layout is guranteed to be the same. For this particular case it's guranteed that [u8] and PotentialUtf8 have the same layout via the #[repr(transparent)] attribute on PotentialUtf8. This gurantee doesn't seem to extend on Box as:

  • Box itself is not marked as #[repr(transparent)] (or #[repr(C)])
  • The memory layout section of the standard library documentation for Box only gurantees a certain layout for sized types:

So long as T: Sized, a Box is guaranteed to be represented as a single pointer and is also ABI-compatible with C pointers (i.e. the C type T*).

  • This specific case involves unsized types, so that gurantee doesn't help us here.

Given that it seems safer to just perform a pointer cast there instead which only involves types with a known compatible layout.

Practically speaking I wouldn't expect that to ever become a problem, but it's better to use a way that's guranteed to work.

Changelog

potential_utf: Use Box::from_raw() instead of transmute for converting unsized transparent boxes.

@weiznich weiznich requested a review from a team as a code owner April 14, 2026 12:09
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 14, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for finding

please add a safety comment for the usage of Box::from_raw. I think you want to say that PotentialUtf8 and [u8] have the same memory layout, and both Box types use the same allocator

@robertbastian
Copy link
Copy Markdown
Member

we seem to have three other Box transmutes in the repo that we also need to fix.

@sffc this also needs to be fixed in your PR #7788

Copy link
Copy Markdown
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good idea, even though I agree that this is practically not a problem. If we asked, Rust would probably document this for us.

Still, for unsafe code, it is good to be technically correct as much as possible, for future-proofness and also to improve the quality of the code in the eyes of reviewers.

r+ with Robert's comments addressed.

This commit removes a transmute from `PotentialUtf8::from_boxed_bytes`
as this might rely on undefined behaviour.

It is only sound to transmute between two types like that if their
layout is guranteed to be the same. For this particular case it's
guranteed that `[u8]` and `PotentialUtf8` have the same layout via the
`#[repr(transparent)]` attribute on `PotentialUtf8`. This gurantee
doesn't seem to extend on `Box` as:

* `Box` itself is not marked as `#[repr(transparent)]` (or `#[repr(C)]`)
* The [memory
  layout](https://doc.rust-lang.org/std/boxed/index.html#memory-layout)
section of the standard library documentation for `Box` only gurantees a
certain layout for sized types:
> So long as T: Sized, a Box<T> is guaranteed to be represented as a single pointer and is also ABI-compatible with C pointers (i.e. the C type T*).
* This specific case involves unsized types, so that gurantee doesn't
  help us here.

Given that it seems safer to just perform a pointer cast there instead
which only involves types with a known compatible layout.

Practically speaking I wouldn't expect that to ever become a problem,
but it's better to use a way that's guranteed to work.
@weiznich weiznich force-pushed the fix/potential_utf_ub branch from da43689 to c43905f Compare April 14, 2026 14:25
@weiznich
Copy link
Copy Markdown
Contributor Author

Hopefully these changes fix the safety comment.

For reference: I've also asked about this on zulip: #t-opsem > Is transmuting between `Box<A>` and `Box<B>` UB? @ 💬.

@robertbastian robertbastian merged commit 43e7580 into unicode-org:main Apr 15, 2026
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants