Skip to content

Build Python packages using the limited API #42

Open
@vyasr

Description

@vyasr

Python has a limited API that is guaranteed to be stable across minor releases. Any code using the Python C API that limits itself to using code in the limited API is guaranteed to also compile on future minor versions of Python within the same major family. More importantly, all symbols in the current (and some historical) version of the limited API are part of Python's stable ABI, which also does not change between Python minor versions and allows extensions compiled against one Python version to continue working on future versions of Python.

Currently RAPIDS builds a single wheel per Python version. If we were to compile using the Python stable ABI, we would be able to instead build a single wheel that works for all Python versions that we support. There would be a number of benefits here:

  • Reduced build time: This benefit is largely reduced by Support dynamic linking between RAPIDS wheels #33, since if we build the C++ components as standalone wheels they are already Python-independent (except when we actually use the Python C API in our own C libraries; the only example that I'm currently aware of in RAPIDS is ucxx). The Python components alone are generally small and easy to build. We'll still benefit, but the benefits will be much smaller.
  • Reduced testing time: Currently we run test across a number of Python versions for our packages on every PR. We often struggle with what versions need to be tested each time. If we were to only build a single wheel that runs on all Python versions, it would be much easier to justify a consistent strategy of always testing e.g. the earliest and latest Python versions. We may still want to test more broadly in nightlies, but really the only failure mode here is if a patch release is made for a Python version that is neither the earliest nor the latest, and that patch release contains breaking changes. That is certainly possible (e.g. the recent dask failure that forced us to make a last-minute patch), but it's infrequent enough that we don't need to be testing regularly.
  • Wider support matrix: Since we'll have a single binary that works for all Python versions, maintaining the full support matrix will be a lot easier and we won't feel as much pressure to drop earlier versions in order to support newer ones.
  • Day 0 support: Our wheels will work for new Python versions as soon as they're released. Of course, if there are breaking changes then we'll have to address those, but in the average case where things do work users won't be stuck waiting on us.
  • Better installation experience: Having a wheel that automatically works across Python versions will reduce the frequency of issues that are raised around our pip installs.

Here are the tasks (some ours, some external) that need to be accomplished to make this possible:

  • Making Cython compatible with the limited API: Cython has preliminary support for the limited API. However, this support is still experimental, and most code still won't compile. I have been making improvements to Cython itself to fix this, and I now have a local development branch of Cython where I can compile most of RAPIDS (with additional changes to RAPIDS libraries). We won't be able to move forward with releasing production abi3 wheels until this support in Cython is released. This is going to be the biggest bottleneck for us.
  • nanobind support for the limited API: nanobind can already produce abi3 wheels when compiled with Python 3.12 or later. Right now we use nanobind in pylibcugraphops, and nowhere else.
  • Removing C API usage in our code: RAPIDS makes very minimal direct usage of the Python C API. The predominant use case that I see is creating memoryviews in order to access some buffers directly. We can fix this by constructing buffers directly. The other thing we'll want to do is remove usage of the NumPy C API, which has no promise of supporting the limited API AFAIK. That will be addressed in Remove usage of the NumPy C API #41. Other use cases can be addressed incrementally.
  • Intermediate vs. long-term: If Cython support for the limited API ends up being released before RAPIDS drops support for Python 3.10, we may be in an intermediate state where we still need to build a version-specific wheel for 3.10 while building an abi3 wheel for 3.11+ (and 3.12+ for pylibcugraphops due to nanobind). If that is the case, it shouldn't cause much difficulty since it'll just involve adding a tiny bit of logic on top of our existing GH workflows.

At this stage, it is not yet clear whether the tradeoffs required will be worthwhile, or at what point the ecosystem's support for the limited API will be reliable enough for us to use in production. However, it shouldn't be too much work to get us to the point of at least being able to experiment with limited API builds, so we can start answering questions around performance and complexity fairly soon. I expect that we can pretty easily remove explicit reliance on any APIs that are not part of the stable ABI, at which point this really becomes a question of the level of support our binding tools provide and if/when we're comfortable with those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions