Description
In our project which requires perfect C++/Rust integration (where we actually consume a Rust library in a large C++ project) we extended cxx
crate to handle Rust by-value types, so we can safely give an instance of a Rust object to C++ (and back from C++ to Rust). Now we'd like to upstream the change, so everyone can profit.
The major issue is the unknown type layout on the C++ side. The parser in cxx::bridge
simply cannot know the size and alignment of the data type and also the traits the type implements. Our solution is pretty simple:
- Explicitly define the layout of the C++ type by specifying
#[repr(layout(<size>, <alignment>))]
on theextern "Rust"
type, where the size and alignment are checked at compile-time that they are indeed at-least for the type (at-least because the bridge can target multiple platforms with different pointer sizes, one can then pick the maximum size/alignment for now). - Allow to
#[derive(Copy)]
and/or#[derive(Clone)]
traits on the type (which are then checked at compile-time to ensure that the original object indeed has them). DerivingDefault
would be also easily possible. - Generate respective constructors, assignment operators and destructors as needed on the C++ side and appropriate bridge functions to call back into Rust.
- Reserve aligned space in the C++ object for the type.
The data stored in the reserved space in the C++ object is then either T
or Option<T>
on the Rust side, depending on whether the Rust object implements Copy
or not.
- If
Copy
is implemented, then there is no issue whatsoever, since Rust'sCopy
objects can be copied/moved freely, so the data is just memcpy'ed as needed on the C++ side. - If
Copy
is not implemented, then the data type isOption<T>
anddrop
,forget
and optionallyclone
(for types implementingClone
) callbacks are generated.
The reasoning behind Option<T>
is the following: C++ doesn't have a good notion of object ownership. If an object is moved to another location via move constructor or move assignment, the original object will still be destroyed by its destructor. Calling drop
on this moved-out object would be fatal. Therefore, the object is represented by Option<T>
and moving out of the object will call forget
callback on it, which writes Option<T>::None
pattern into it. Following destruction of the object via drop
callback will still call drop, but on None
pattern, which is a no-op, thus it's safe.
When passing objects by-value from Rust to C++, they are wrapped in Option<T>::Some
. When returning them back to Rust, the Option
is unwrapped, so even if someone tries to return back a moved-out object to Rust (which is UB), we'll detect it. Similarly, passing references or pointers to T
from C++ to Rust will effectively entail passing references or pointers to Option<T>
, which can be also checked for moved-out objects (which is still UB, but better to report it). We didn't implement that yet.
Another limitation in our implementation is also that the T
and Option<T>
are required to have the same size (i.e., Option<T>
must use some niche or some invalid pattern to represent None
). The reasoning is that this is typically anyway the case for all practical purposes (since we often want to pass handle types, which contain some Arc
or the like) and it's fairly easy to add a member with a niche if needed. On the plus side, the binary representation/layout is then exactly same for T
and Option<T>::Some
, so passing references to C++ is also well-defined - simply pass the reference as-is.
There is one danger, though. Passing a mutable reference to C++ would allow moving out of the object on the C++ side. Again, this would be UB from our PoV, but C++ doesn't care. We can check this, however, after the C++ call returns. The binary pattern of the object passed by reference must not correspond to None
pattern. With that, we can also catch UB for this (i.e., C++ side reinterpreting the mutable reference as an rvalue and moving out of the object). Similarly, if rvalue references would be allowed, then this can be implemented by using &mut Option<T>
as a parameter on the Rust side, which would then correspond to rvalue reference on the C++ side. We didn't implement it, but it would be possible. This would also help addressing #561 trivially.
Other issue which could be addressed fairly easily would be #251. Maybe it would help a bit with issue #171 (by providing Option
support).
Any comments/ideas on the aforementioned design?
As mentioned, we'd like to upstream the changes, which already exist, but since picking the right subset is not trivial, I'd like to clarify at least the minimal interface and minimal useful feature set.
Thanks.