Skip to content

Design for Rust object by-value passing to C++ and back #1185

Open
@schreter

Description

@schreter

In our project which requires perfect C++/Rust integration (where we actually consume a Rust library in a large C++ project) we extended cxx crate to handle Rust by-value types, so we can safely give an instance of a Rust object to C++ (and back from C++ to Rust). Now we'd like to upstream the change, so everyone can profit.

The major issue is the unknown type layout on the C++ side. The parser in cxx::bridge simply cannot know the size and alignment of the data type and also the traits the type implements. Our solution is pretty simple:

  • Explicitly define the layout of the C++ type by specifying #[repr(layout(<size>, <alignment>))] on the extern "Rust" type, where the size and alignment are checked at compile-time that they are indeed at-least for the type (at-least because the bridge can target multiple platforms with different pointer sizes, one can then pick the maximum size/alignment for now).
  • Allow to #[derive(Copy)] and/or #[derive(Clone)] traits on the type (which are then checked at compile-time to ensure that the original object indeed has them). Deriving Default would be also easily possible.
  • Generate respective constructors, assignment operators and destructors as needed on the C++ side and appropriate bridge functions to call back into Rust.
  • Reserve aligned space in the C++ object for the type.

The data stored in the reserved space in the C++ object is then either T or Option<T> on the Rust side, depending on whether the Rust object implements Copy or not.

  • If Copy is implemented, then there is no issue whatsoever, since Rust's Copy objects can be copied/moved freely, so the data is just memcpy'ed as needed on the C++ side.
  • If Copy is not implemented, then the data type is Option<T> and drop, forget and optionally clone (for types implementing Clone) callbacks are generated.

The reasoning behind Option<T> is the following: C++ doesn't have a good notion of object ownership. If an object is moved to another location via move constructor or move assignment, the original object will still be destroyed by its destructor. Calling drop on this moved-out object would be fatal. Therefore, the object is represented by Option<T> and moving out of the object will call forget callback on it, which writes Option<T>::None pattern into it. Following destruction of the object via drop callback will still call drop, but on None pattern, which is a no-op, thus it's safe.

When passing objects by-value from Rust to C++, they are wrapped in Option<T>::Some. When returning them back to Rust, the Option is unwrapped, so even if someone tries to return back a moved-out object to Rust (which is UB), we'll detect it. Similarly, passing references or pointers to T from C++ to Rust will effectively entail passing references or pointers to Option<T>, which can be also checked for moved-out objects (which is still UB, but better to report it). We didn't implement that yet.

Another limitation in our implementation is also that the T and Option<T> are required to have the same size (i.e., Option<T> must use some niche or some invalid pattern to represent None). The reasoning is that this is typically anyway the case for all practical purposes (since we often want to pass handle types, which contain some Arc or the like) and it's fairly easy to add a member with a niche if needed. On the plus side, the binary representation/layout is then exactly same for T and Option<T>::Some, so passing references to C++ is also well-defined - simply pass the reference as-is.

There is one danger, though. Passing a mutable reference to C++ would allow moving out of the object on the C++ side. Again, this would be UB from our PoV, but C++ doesn't care. We can check this, however, after the C++ call returns. The binary pattern of the object passed by reference must not correspond to None pattern. With that, we can also catch UB for this (i.e., C++ side reinterpreting the mutable reference as an rvalue and moving out of the object). Similarly, if rvalue references would be allowed, then this can be implemented by using &mut Option<T> as a parameter on the Rust side, which would then correspond to rvalue reference on the C++ side. We didn't implement it, but it would be possible. This would also help addressing #561 trivially.

Other issue which could be addressed fairly easily would be #251. Maybe it would help a bit with issue #171 (by providing Option support).

Any comments/ideas on the aforementioned design?

As mentioned, we'd like to upstream the changes, which already exist, but since picking the right subset is not trivial, I'd like to clarify at least the minimal interface and minimal useful feature set.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions