Design for Rust object by-value passing to C++ and back

In our project which requires perfect C++/Rust integration (where we actually consume a Rust library in a large C++ project) we extended `cxx` crate to handle Rust by-value types, so we can safely give an instance of a Rust object to C++ (and back from C++ to Rust). Now we'd like to upstream the change, so everyone can profit.

The major issue is the unknown type layout on the C++ side. The parser in `cxx::bridge` simply cannot know the size and alignment of the data type and also the traits the type implements. Our solution is pretty simple:

- Explicitly define the layout of the C++ type by specifying `#[repr(layout(<size>, <alignment>))]` on the `extern "Rust"` type, where the size and alignment are checked at compile-time that they are indeed at-least for the type (at-least because the bridge can target multiple platforms with different pointer sizes, one can then pick the maximum size/alignment for now).
- Allow to `#[derive(Copy)]` and/or `#[derive(Clone)]` traits on the type (which are then checked at compile-time to ensure that the original object indeed has them). Deriving `Default` would be also easily possible.
- Generate respective constructors, assignment operators and destructors as needed on the C++ side and appropriate bridge functions to call back into Rust.
- Reserve aligned space in the C++ object for the type.

The data stored in the reserved space in the C++ object is then either `T` or `Option<T>` on the Rust side, depending on whether the Rust object implements `Copy` or not.

- If `Copy` is implemented, then there is no issue whatsoever, since Rust's `Copy` objects can be copied/moved freely, so the data is just memcpy'ed as needed on the C++ side.
- If `Copy` is *not* implemented, then the data type is `Option<T>` and `drop`, `forget` and optionally `clone` (for types implementing `Clone`) callbacks are generated.

The reasoning behind `Option<T>` is the following: C++ doesn't have a good notion of object ownership. If an object is moved to another location via move constructor or move assignment, the original object will still be destroyed by its destructor. Calling `drop` on this moved-out object would be fatal. Therefore, the object is represented by `Option<T>` and moving out of the object will call `forget` callback on it, which writes `Option<T>::None` pattern into it. Following destruction of the object via `drop` callback will still call drop, but on `None` pattern, which is a no-op, thus it's safe.

When passing objects by-value from Rust to C++, they are wrapped in `Option<T>::Some`. When returning them back to Rust, the `Option` is unwrapped, so even if someone tries to return back a moved-out object to Rust (which is UB), we'll detect it. Similarly, passing references or pointers to `T` from C++ to Rust will effectively entail passing references or pointers to `Option<T>`, which can be also checked for moved-out objects (which is still UB, but better to report it). We didn't implement that yet.

Another limitation in our implementation is also that the `T` and `Option<T>` are required to have the same size (i.e., `Option<T>` must use some niche or some invalid pattern to represent `None`). The reasoning is that this is typically anyway the case for all practical purposes (since we often want to pass *handle* types, which contain some `Arc` or the like) and it's fairly easy to add a member with a niche if needed. On the plus side, the binary representation/layout is then exactly same for `T` and `Option<T>::Some`, so passing references to C++ is also well-defined - simply pass the reference as-is.

There is one danger, though. Passing a mutable reference to C++ would allow moving out of the object on the C++ side. Again, this would be UB from our PoV, but C++ doesn't care. We can check this, however, after the C++ call returns. The binary pattern of the object passed by reference must not correspond to `None` pattern. With that, we can also catch UB for this (i.e., C++ side reinterpreting the mutable reference as an rvalue and moving out of the object). Similarly, if rvalue references would be allowed, then this can be implemented by using `&mut Option<T>` as a parameter on the Rust side, which would then correspond to rvalue reference on the C++ side. We didn't implement it, but it would be possible. This would also help addressing https://github.com/dtolnay/cxx/issues/561 trivially.

Other issue which could be addressed fairly easily would be https://github.com/dtolnay/cxx/issues/251. Maybe it would help a bit with issue https://github.com/dtolnay/cxx/issues/171 (by providing `Option` support).

Any comments/ideas on the aforementioned design?

As mentioned, we'd like to upstream the changes, which already exist, but since picking the right subset is not trivial, I'd like to clarify at least the minimal interface and minimal useful feature set.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design for Rust object by-value passing to C++ and back #1185

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design for Rust object by-value passing to C++ and back #1185

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions