Skip to content

Commit 27b4d4c

Browse files
Shifted README contents to documentation, fixed doc build error. (#114)
* Shifted README contents to documentation. * Changed README back.
1 parent 7be1af8 commit 27b4d4c

2 files changed

Lines changed: 4 additions & 197 deletions

File tree

README.md

Lines changed: 3 additions & 196 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22
[![OEQ CUDA C++ Extension Build Verification](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml/badge.svg?event=push)](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml)
33
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
44

5-
[[Examples]](#show-me-some-examples) [[Installation]](#installation)
6-
[[Supported Tensor Products]](#tensor-products-we-accelerate)
5+
[[Examples]](#show-me-some-examples)
76
[[Citation and Acknowledgements]](#citation-and-acknowledgements)
87

98
OpenEquivariance is a CUDA and HIP kernel generator for the Clebsch-Gordon tensor product,
@@ -27,9 +26,8 @@ We also offer fused equivariant graph
2726
convolutions that can reduce
2827
computation and memory consumption significantly.
2928

30-
We currently support NVIDIA GPUs and just added beta support on AMD GPUs for
31-
all tensor products! See [the coverage table](#tensor-products-we-accelerate) for more
32-
details.
29+
For detailed instructions on tests, benchmarks, MACE / Nequip, and our API,
30+
check out the [documentation](https://passionlab.github.io/OpenEquivariance).
3331

3432
📣 📣 OpenEquivariance was accepted to the 2025 SIAM Conference on Applied and
3533
Computational Discrete Algorithms (Proceedings Track)! Catch the talk in
@@ -129,197 +127,6 @@ print(torch.norm(Z))
129127
`deterministic=False`, the `sender` and `receiver` indices can have
130128
arbitrary order.
131129

132-
**New:** If you're working in FP32 precision and want
133-
higher accuracy during graph convolution, we offer a Kahan
134-
summation variant of our deterministic algorithm:
135-
136-
```python
137-
tp_conv_kahan = oeq.TensorProductConv(problem, torch_op=True, deterministic=True, kahan=True)
138-
Z = tp_conv_kahan.forward(X, Y[receiver_perm], W[receiver_perm], edge_index[0], edge_index[1], sender_perm)
139-
print(torch.norm(Z))
140-
```
141-
142-
## Installation
143-
We currently support Linux systems only.
144-
Before installation and the first library import,
145-
ensure that the command
146-
`c++ --version` returns GCC 9+; if not, set the
147-
`CC` and `CXX` environment variables to point to
148-
valid compilers. On NERSC Perlmutter,
149-
`module load gcc` will set up your environment
150-
correctly.
151-
152-
To install, run
153-
```bash
154-
pip install git+https://github.com/PASSIONLab/OpenEquivariance
155-
```
156-
After installation, the very first library
157-
import will trigger a build of a C++ extension we use,
158-
which takes longer than usual.
159-
All subsequent imports will not retrigger compilation.
160-
161-
## Replicating our benchmarks
162-
To run our benchmark suite, you'll also need the following packages:
163-
- `e3nn`,
164-
- `cuEquivariance`
165-
- `cuEquivariance-torch`
166-
- `cuEquivariance-ops-torch-cu11` OR `cuEquivariance-ops-torch-cu12`
167-
- `matplotlib` (to reproduce our figures)
168-
169-
You can get all the necessary dependencies via our optional dependencies `[bench]`
170-
171-
```bash
172-
pip install "git+https://github.com/PASSIONLab/OpenEquivariance[bench]"
173-
```
174-
175-
We conducted our benchmarks on an NVIDIA A100-SXM-80GB GPU at
176-
Lawrence Berkeley National Laboratory. Your results may differ
177-
a different GPU.
178-
179-
The file `tests/benchmark.py` can reproduce the figures in
180-
our paper an A100-SXM4-80GB GPU.
181-
Run it with the following invocations:
182-
```bash
183-
python tests/benchmark.py -o outputs/uvu uvu --plot
184-
python tests/benchmark.py -o outputs/uvw uvw --plot
185-
python tests/benchmark.py -o outputs/roofline roofline --plot
186-
python tests/benchmark.py -o outputs/conv conv --plot --data data/molecular_structures
187-
python tests/benchmark.py -o outputs/kahan_conv kahan_conv --data data/molecular_structures/
188-
```
189-
190-
If your GPU has limited memory, you might want to try
191-
the `--limited-memory` flag to disable some expensive
192-
tests and / or reduce the batch size with `-b`. Run
193-
`python tests/benchmark.py --help` for a full list of flags.
194-
195-
Here's a set
196-
of invocations for an A5000 GPU:
197-
198-
```bash
199-
python tests/benchmark.py -o outputs/uvu uvu --limited-memory --plot
200-
python tests/benchmark.py -o outputs/uvw uvw -b 25000 --plot
201-
python tests/benchmark.py -o outputs/roofline roofline --plot
202-
python tests/benchmark.py -o outputs/conv conv --data data/molecular_structures --limited-memory
203-
```
204-
Note that for GPUs besides the one we used in our
205-
testing, the roofline slope / peak will be incorrect, and your results
206-
may differ from the ones we've reported. The plots for the convolution fusion
207-
experiments also require a GPU with a minimum of 40GB of memory.
208-
209-
## Testing Correctness
210-
See the `dev` dependencies in `pyproject.toml`; you'll need `e3nn`,
211-
`pytest`, `torch_geometric`, and `pytest-check` installed. You can test batch
212-
tensor products and fused convolution tensor products as follows:
213-
```bash
214-
pytest tests/batch_test.py
215-
pytest tests/conv_test.py
216-
```
217-
Browse the file to select specific tests.
218-
219-
## Compilation with JITScript, Export, and AOTInductor
220-
OpenEquivariance supports model compilation with
221-
`torch.compile`, JITScript, `torch.export`, and AOTInductor.
222-
Demo the C++ model exports with
223-
```bash
224-
pytest tests/export_test.py
225-
```
226-
NOTE: the AOTInductor test (and possibly export) fail
227-
unless you are using a Nightly
228-
build of PyTorch past 4/10/2025 due to incomplete support for
229-
TorchBind in earlier versions.
230-
231-
## Running MACE
232-
**NOTE**: If you're revisiting this page, the repo containing
233-
our up-to-date MACE integration has changed! See the instructions
234-
below; we use a branch off a fork of MACE to facilitate
235-
PRs into the main codebase.
236-
237-
We have modified MACE to use our accelerated kernels instead
238-
of the standard e3nn backend. Here are the steps to replicate
239-
our MACE benchmark:
240-
241-
1. Install `oeq` and our modified version of MACE:
242-
```bash
243-
pip uninstall mace-torch
244-
pip install git+https://github.com/PASSIONLab/OpenEquivariance
245-
pip install git+https://github.com/vbharadwaj-bk/mace_oeq_integration.git@oeq_experimental
246-
```
247-
248-
2. Download the `carbon.xyz` data file, available at <https://portal.nersc.gov/project/m1982/equivariant_nn_graphs/>.
249-
This graph has 158K edges. With the original e3nn backend, you would need a GPU with 80GB
250-
of memory to run the experiments. `oeq` provides a memory-efficient equivariant convolution, so we expect
251-
the test to succeed.
252-
253-
3. Benchmark OpenEquivariance:
254-
```bash
255-
python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i oeq
256-
```
257-
258-
4. If you have a GPU with 80GB of memory OR supply a smaller molecular graph
259-
as the input file, you can run the full benchmark that includes `e3nn` and `cue`:
260-
```bash
261-
python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i e3nn cue oeq
262-
```
263-
264-
## Tensor products we accelerate
265-
266-
| Operation | CUDA | HIP |
267-
|--------------------------|----------|-----|
268-
| UVU |||
269-
| UVW |||
270-
| UVU + Convolution |||
271-
| UVW + Convolution |||
272-
| Symmetric Tensor Product | ✅ (beta) | ✅ (beta) |
273-
274-
e3nn supports a variety of connection modes for CG tensor products. We support
275-
two that are commonly used in equivariant graph neural networks:
276-
"uvu" and "uvw". Our JIT compiled kernels should handle:
277-
278-
1. Pure "uvu" tensor products, which are most efficient when the input with higher
279-
multiplicities is the first argument. Our results are identical to e3nn when irreps in
280-
the second input have multiplicity 1, and otherwise identical up to a reordering
281-
of the input weights.
282-
283-
2. Pure "uvw" tensor products, which are currently more efficient when the input with
284-
higher multiplicities is the first argument. Our results are identical to e3nn up to a reordering
285-
of the input weights.
286-
287-
Our code includes correctness checks, but the configuration space is large. If you notice
288-
a bug, let us know in a Github issue. We'll try our best to correct it or document the problem here.
289-
290-
We do not (yet) support:
291-
292-
- Mixing different instruction types in the same tensor product.
293-
- Instruction types besides "uvu" and "uvw".
294-
- Non-trainable instructions: all of your instructions must have weights associated.
295-
296-
If you have a use case for any of the unsupported features above, let us know.
297-
298-
We have recently added beta support for symmetric
299-
contraction acceleration. Because this is a kernel
300-
specific to MACE, we require e3nn as dependency
301-
to run it, and there is currently no support for
302-
compile / export (coming soon!), we
303-
do not expose it in the package
304-
toplevel. You can test out our implementation by
305-
running
306-
307-
```python
308-
from openequivariance.implementations.symmetric_contraction import SymmetricContraction as OEQSymmetricContraction
309-
```
310-
311-
## Multidevice / Stream Support
312-
To use OpenEquivariance on multiple GPUs of a single
313-
compute node, we currently require that all GPUs
314-
share the same compute capability. This is because
315-
our kernels are compiled based on the shared memory
316-
capacity of the numerically first visible GPU card.
317-
On heterogeneous systems, you can still
318-
use OpenEquivariance on all GPUs that match the
319-
compute capability of the first visible device.
320-
321-
We are working on support for CUDA streams!
322-
323130
## Citation and Acknowledgements
324131
If you find this code useful, please cite our paper:
325132

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,5 +34,5 @@
3434

3535
sys.path.insert(0, str(Path("..").resolve()))
3636

37-
autodoc_mock_imports = ["torch", "openequivariance.extlib", "jinja2"]
37+
autodoc_mock_imports = ["torch", "openequivariance.extlib", "jinja2", "numpy"]
3838
autodoc_typehints = "description"

0 commit comments

Comments
 (0)