You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Migrates Monarch to more modern Python packaging standards by moving static metadata to pyproject.toml, adopting a declarative config approach.
Ideally, everything is kept within pyproject.toml and is easily installed with `pip install .`
Concretely this means, getting rid of setup.py altogether and all `*-requirements.txt`. But there are a few challenges that I want to cover:
Why we need setup.py still:
`setup.py` cannot be fully eliminated because Monarch has dynamic build requirements:
1. C++ extensions that link against PyTorch (requires torch library paths and headers)
2. Rust extensions via setuptools-rust (requires LIBTORCH_LIB environment variables)
3. Dynamic CUDA detection and CXX11 ABI detection at build time
4. Platform-specific rpath configuration for linking
5. Environment variable overrides (MONARCH_PACKAGE_NAME, MONARCH_VERSION,
USE_TENSOR_ENGINE)
6. Custom build commands (Clean command)
Why we still need build-requirements.txt in this change:
build-requirements.txt (mirroring [build-system.requires]) is required because:
1. setup.py detects torch paths at module import time (lines 99-103) BEFORE the build
2. This requires torch to be pre-installed in the build environment
3. torch cannot be declared in [build-system.requires] because it needs custom index URLs (e.g., --index-url https://download.pytorch.org/whl/nightly/cu126)
4. Therefore we use `--no-build-isolation` which then disables automatic installation of `[build-system.requires]`, requiring manual installation via build-requirements.txt
To eliminate build-requirement.txt, this would require us to enable standard isolated builds. Concretely:
1. Refactor setup.py to move torch detection from module-level into build command methods (e.g., inside CustomBuildExt.run())
2. Delay all torch path detection until the build_ext command actually executes
3. This would allow build isolation to work because torch would be installed by the user before running `pip install .`, but build deps would be auto-installed
Changes:
- Add [build-system] configuration per PEP 517/518
- Move all static metadata to [project] section per PEP 621 (dependencies, authors, license, entry points, etc.)
- Migrate runtime dependencies from requirements.txt to pyproject.toml
- Migrate test dependencies to [project.optional-dependencies.test]
- Simplify setup.py to only contain dynamic configuration (torch detection, C++/Rust extensions, environment variables)
- Updates in CI and READMEs
- Migrates from `python setup.py bdist_wheel` to `python -m build`. Setup.py is being deprecated: https://packaging.python.org/en/latest/discussions/setup-py-deprecated/
Differential Revision: D89066231
Copy file name to clipboardExpand all lines: README.md
+53-33Lines changed: 53 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,18 @@
3
3
**Monarch** is a distributed programming framework for PyTorch based on scalable
4
4
actor messaging. It provides:
5
5
6
-
1. Remote actors with scalable messaging: Actors are grouped into collections called meshes and messages can be broadcast to all members.
7
-
2. Fault tolerance through supervision trees: Actors and processes form a tree and failures propagate up the tree, providing good default error behavior and enabling fine-grained fault recovery.
8
-
3. Point-to-point RDMA transfers: cheap registration of any GPU or CPU memory in a process, with the one-sided transfers based on libibverbs
9
-
4. Distributed tensors: actors can work with tensor objects sharded across processes
10
-
11
-
Monarch code imperatively describes how to create processes and actors using a simple python API:
6
+
1. Remote actors with scalable messaging: Actors are grouped into collections
7
+
called meshes and messages can be broadcast to all members.
8
+
2. Fault tolerance through supervision trees: Actors and processes form a tree
9
+
and failures propagate up the tree, providing good default error behavior and
10
+
enabling fine-grained fault recovery.
11
+
3. Point-to-point RDMA transfers: cheap registration of any GPU or CPU memory in
12
+
a process, with the one-sided transfers based on libibverbs
13
+
4. Distributed tensors: actors can work with tensor objects sharded across
14
+
processes
15
+
16
+
Monarch code imperatively describes how to create processes and actors using a
17
+
simple python API:
12
18
13
19
```python
14
20
from monarch.actor import Actor, endpoint, this_host
@@ -33,8 +39,9 @@ fut = trainers.train.call(step=0)
33
39
fut.get()
34
40
```
35
41
36
-
37
-
The [introduction to monarch concepts](https://meta-pytorch.org/monarch/generated/examples/getting_started.html) provides an introduction to using these features.
42
+
The
43
+
[introduction to monarch concepts](https://meta-pytorch.org/monarch/generated/examples/getting_started.html)
44
+
provides an introduction to using these features.
38
45
39
46
> ⚠️ **Early Development Warning** Monarch is currently in an experimental
40
47
> stage. You should expect bugs, incomplete features, and APIs that may change
@@ -45,16 +52,21 @@ The [introduction to monarch concepts](https://meta-pytorch.org/monarch/generate
45
52
46
53
## 📖 Documentation
47
54
48
-
View Monarch's hosted documentation [at this link](https://meta-pytorch.org/monarch/).
55
+
View Monarch's hosted documentation
56
+
[at this link](https://meta-pytorch.org/monarch/).
49
57
50
58
## Installation
51
-
Note for running distributed tensors and RDMA, the local torch version must match the version that monarch was built with.
52
-
Stable and nightly distributions require libmxl and libibverbs (runtime).
59
+
60
+
Note for running distributed tensors and RDMA, the local torch version must
61
+
match the version that monarch was built with. Stable and nightly distributions
# If you are building with RDMA support, build monarch with `USE_TENSOR_ENGINE=1 pip install --no-build-isolation .` and dnf install the following packages
0 commit comments