Skip to content

[RFC] Cross-Platform Refactor: Build System and Binary Distribution #1032

Closed
@Titus-von-Koeller

Description

@Titus-von-Koeller

We're confirming the choice of migrating to CMake for cross-platform builds and are in the process of moving everything to Github workflows.

The goal of this issue is to have a central place to discuss, agree on work still needed and track progress on the following topics:

  • the build + distribution of binaries
  • security implications
  • discuss the distribution of binaries in light of PyPI's size restrictions, considering alternatives like splitting binaries for each platform or using an extra index, akin to PyTorch's approach.

Strategies for Binary Distribution and Size Constraints: Discuss the options for distributing binaries, considering PyPI's size restrictions and the potential for using separate packages for each platform or an additional index. We will likely ask for an exemption to the PyPi size constraint for the time being, but we need a new way to go forward. I am aware of useful discussions around these topics in PRs and issues in the last week, but since I'll be away on vacation and need to leave now, I couldn't yet integrate everything into this page.

However, the idea is that this is the central hub for such discussions and I'll integrate any feedback into this description here in order to have an up-to-date summary of decisions, challenges, tasks, etc.

Distribution of binaries

Right now we're already hitting PyPI's 100MB size restriction and could therefore not add new functionality yet, as it would increase binary size.

There are two approaches that could solve this:

  1. Split out binaries for each platform into a separate binary distribution PyPi package and use PyPi.
  2. Akin to PyTorch, use an extra index (we would like to avoid this).

@albanD had a few valuable comments about that (feel free to engage in this discussion as well :) ):

AFAIK the main reason we use this is because of the binary size limit on PyPi. If you are not hitting these limits, you should use PyPi directly for sure.

There were security concerns with --extra-index-url but not --index-url. The main downside is that we must vendor on our page all the packages we depend on.

The main thing you want to ensure if you're suggesting extra-index-url is that you control the package name both in pypi AND your url to make sure you control all versions.

[... otherwise someone can] create a package with the same name on PyPi and [take] precedence over ours because they bumped the version. (could also [happen] without the version bump btw.

Security implications

Also from @albanD:

The [risks] that are more critical I think are about people poluting our binaries or getting users to download malicious binaries from our usual channels.
The blogpost (https://johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch/) is definitely one of these. There are a few variants of this depending on your CI system. But in general, how do you protect any secret that you use in your CI/CD (in particular, the ones with write access to pypi, github release, custom website where you ask people to download from, etc).

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCrequest for comments on proposed library improvementscross-platform

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions