-
Notifications
You must be signed in to change notification settings - Fork 229
Add upcast_hook for exposing non-primary base relationships #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @oremanj, While indeed very compact, I am not in looking forward to include a feature to specifically enable multiple inheritance (or approximations thereof). However, there is another feature that I am looking to add at some point. And it might be possible to implement this feature in such a way that it subsumes both use cases, so I thought it might be good to discuss it here. Specifically, I would like to add a compatibility layer that enables interoperability between pybind11 and nanobind-based bindings. (E.g. a nanobind function accepting an instance of a type that has been bound in pybind11). This, too, requires some kind of hook to check type compatibility and return a pointer, which seems quite similar to what you are doing here. A first small step towards this is already done: both nanobind and pybind11 (on master) switched to a new and more relaxed ABI tag that is now consistent between those two projects (so binding layers aside, we can check if conversion of an instance from one framework to the other risks a crash). The next step will be to add a hook to bindings that extracts the C++ pointer associated with a Python object, e.g. a callback with signature
Since we can't stash this in the type object (pybind11 doesn't have any scratch space there), this might need to go into a static capsule or similar, e.g. |
Storing the foreign cast hook on the Python instance seems like a good convention if you only need to support from-Python conversions. If you need to support to-Python conversions (or implicit conversions in from-Python), you need a registry of some sort in order to be able to look up the converter by its C++ type. That's of course more complex and heavier-weight in terms of code size, though plausibly faster at runtime. It looks like pybind did something similar to your proposal pybind/pybind11#5296 but theirs is defined as a Python callable rather than a capsule. The capsule is certainly more "nano" and it would be nice if both projects could agree on a unified approach there. I have a local implementation of the registry-based solution for nanobind/pybind interop specifically, which I'd be happy to clean up and upload if it's of interest. It involves a structure
along with "import/export foreign type" functions defined in each binding library that accept or produce copies of this structure. You ask the native library for a certain type to export hooks for it, and then pass them to the import side of the other library/ies you want to use it with. Each library's internals structure has a map from C++ type to foreign_caster_hooks, which it consults when it would otherwise be about to fail a conversion. There are some other details that arise around exception translation and synchronizing the full list of types between the libraries (so you don't have to write a separate import/export statement for each one). We've been using this in production for a good year now and it's allowed us to convert some huge extension modules from pybind to nanobind pretty much one compilation unit at a time, with relatively few surprises. I don't think the capsule/conduit approach could do that, although it seems like it would be far easier to implement and would probably work great for simple cases. I've been working on refining the registry-based approach some and I think it's plausible to support automatic cross-library (including cross-ABI-version-of-same-library) type discovery and exception translation with reasonable performance and with almost no overhead when foreign types are not used. I'm reluctant to put too much more effort in without knowing what level of solution the relevant library maintainers are interested in though. |
Dear @oremanj, this sounds really exciting! I think there is a general question of how such a feature could be incorporated. Failing a type cast is a pretty common situation, so it's good if extra logic in there does not unduly slow down nanobind. If the extra code path is costly, then it might be necessary to place the extra code into an optional But of course this is an extra flag which is nice to avoid. Therefore I'm also really curious to hear what you mean by "type discovery and exception translation with reasonable performance and with almost no overhead when foreign types are not used" (i.e. how such a mechanism could be added with almost no overhead). pybind11 is currently facing significant ABI incompatibility issues (as in ABI incompatibility with itself, there were 2 ABI bumps and another one is coming up). The "conduit" feature was intended to smooth over such transitions but was found to be insufficient. The main problem is that it only handles the |
Regarding the registry approach: this does make sense to me and seems necessary if
What is the advantage compared to a Python dictionary with |
I'm glad to hear this would be of interest!
The basic idea is that if you only have one binding library in the address space, or if you haven't told it to go look for other libraries' types, then the extra logic when you're about to fail a cast is just testing a flag ("any foreign types?"). If you have enabled the interop at runtime then it will check for new types it doesn't yet know about (one word read) and, assuming there are none, do a map lookup along the lines of
You could use such a dictionary if you want. I figure each binding library already has its own preferred way of looking up information from a C++ type (such as nanobind's combination of
This is the part still under development -- in the fully-implemented version we just call an I'll let you know when I get to the point of uploading something usable! |
This sounds great! If there is such a registry that lives outside of nanobind/pybind11, then it seems to me that this would either require:
Thinking a bit more about this..
This is an excellent point. I think we basically don't want the registry to implement any fancy data structures. It should just organize what's available and notify binding libraries if something changed. So this can be just a simple list together with some callback mechanism. Here is an idea on what such an API could look like (cheesily namedafter the Rosetta stone) #if defined(__cplusplus)
extern "C" {
#endif
struct RosettaBinding;
struct RosettaCallback;
/*
* The registry holds bindings and callbacks to notify other frameworks about
* potential changes. It is protected by a mutex in free-threaded builds, and by
* the GIL in regular builds.
*
* The pointer to the registry is stored in some place where extension
* frameworks can access it while being hidden from Python "userspace"
* (e.g. PyInterpreterState_GetDict). .. and the link to the `RosettaRegistry`
* can be stored under some name that bakes in the new
* inter-pybind11-nanobind relaxed ABI name
* (`NB_PLATFORM_ABI_TAG` in `src/nb_abi.h`).
*/
struct RosettaRegistry {
#if defined(Py_GIL_DISABLED)
PyMutex mutex;
#endif
RosettaBinding *bindings;
RosettaCallback *callbacks;
};
/*
* List of type bindings and supported functionality. This is a C doubly linked
* list to provide a lowest-common-denominator ABI and permit cheap addition
* and removal. Each framework *owns* its associated records and is, e.g.,
* responsible for eventually deallocating 'RosettaBinding::name'. After
* unregistering a type, it should call the associated removal callabck.
*/
struct RosettaBinding {
const char *name;
/* std::type_info pointer and mangled name */
const void *type;
const char *type_name_mangled;
/* copied from your code, I changed 'bool' -> uint8_t */
void* (*from_python)(void* context, PyObject* pyobj, uint8_t convert,
void (*keep_referenced)(PyObject*));
PyObject* (*to_python)(void* context, void* cppobj, uint8_t rvp,
PyObject* parent);
/* potentially more stuff here */
/* doubly linked list */
struct RosettaBinding *prev, *next;
};
/* Conventions: callbacks provide the framework with the opportunity to
* update internal data structures based on newly added or removed types.
*
* They should only be called while RosettaRegistry::mutex or the GIL is held.
*
* The callee should not perform Python C API calls, and it should not
* modify the registry while executing the callback. */
struct RosettaCallback {
void (*notify_added)(RosettaBinding *binding);
void (*notify_removed)(RosettaBinding *binding);
RosettaCallback *prev, *next;
};
#if defined(__cplusplus)
};
#endif |
Yep, this is my plan.
I considered this initially, but ran into the fact that a C library would have difficulty specifying the Still, we might be able to figure out a C structure that's sufficiently ABI-compatible with
I don't know enough about the MSVC ABI to make it portable to that. It looks somewhat more involved but still possible. The name "Rosetta" is appealing but it collides with https://www.pyrosetta.org/ and with Apple's dynamic binary translation tool. I've been using "pymetabind" so far which is less fun but probably won't collide with anything... |
pymetabind sounds good too! Producing a In general, I do think that we want the
And the |
Leaving some notes-to-self about MSVC typeinfo ABI for further investigation:
Mm, I see how providing both the typeinfo ptr and the name would help with that. Even without typeinfo, a C library could look up by the name of what it's trying to extract. |
9063be4
to
61e044d
Compare
I think this is the minimal internals change that would allow users to roll their own emulation of some multiple inheritance semantics. Maybe it's still more catering to MI than you're interested in, but I figured I'd give it a shot.