Skip to content

Commit 5984d6d

Browse files
committed
Reverted README for now.
1 parent 7059563 commit 5984d6d

2 files changed

Lines changed: 250 additions & 3 deletions

File tree

.github/workflows/docs.yaml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: Deploy documentation to Github Pages
2+
on:
3+
workflow_dispatch:
4+
5+
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
6+
permissions: write-all
7+
8+
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
9+
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
10+
concurrency:
11+
group: "pages"
12+
cancel-in-progress: false
13+
14+
jobs:
15+
build:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Checkout
19+
uses: actions/checkout@v4
20+
- name: Setup Pages
21+
uses: actions/configure-pages@v3
22+
- name: Set up Python 3.10
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: "3.10"
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install sphinx furo
30+
- name: Build website
31+
run: |
32+
cd docs
33+
sphinx-build -M dirhtml docs docs/_build
34+
35+
- name: Fix permissions
36+
run: |
37+
chmod -c -R +rX "docs/_build/dirhtml" | while read line; do
38+
echo "::warning title=Invalid file permissions automatically fixed::$line"
39+
done
40+
- name: Upload artifact
41+
uses: actions/upload-pages-artifact@v3
42+
with:
43+
path: './docs/_build/dirhtml'
44+
deploy:
45+
environment:
46+
name: github-pages
47+
url: ${{ steps.deployment.outputs.page_url }}
48+
runs-on: ubuntu-latest
49+
needs: build
50+
steps:
51+
- name: Deploy to GitHub Pages
52+
id: deployment
53+
uses: actions/deploy-pages@v4

README.md

Lines changed: 197 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
[![OEQ CUDA C++ Extension Build Verification](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml/badge.svg?event=push)](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml)
33
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
44

5-
[[Examples]](#show-me-some-examples)
5+
[[Examples]](#show-me-some-examples) [[Installation]](#installation)
6+
[[Supported Tensor Products]](#tensor-products-we-accelerate)
67
[[Citation and Acknowledgements]](#citation-and-acknowledgements)
78

89
OpenEquivariance is a CUDA and HIP kernel generator for the Clebsch-Gordon tensor product,
@@ -128,6 +129,197 @@ print(torch.norm(Z))
128129
`deterministic=False`, the `sender` and `receiver` indices can have
129130
arbitrary order.
130131

132+
**New:** If you're working in FP32 precision and want
133+
higher accuracy during graph convolution, we offer a Kahan
134+
summation variant of our deterministic algorithm:
135+
136+
```python
137+
tp_conv_kahan = oeq.TensorProductConv(problem, torch_op=True, deterministic=True, kahan=True)
138+
Z = tp_conv_kahan.forward(X, Y[receiver_perm], W[receiver_perm], edge_index[0], edge_index[1], sender_perm)
139+
print(torch.norm(Z))
140+
```
141+
142+
## Installation
143+
We currently support Linux systems only.
144+
Before installation and the first library import,
145+
ensure that the command
146+
`c++ --version` returns GCC 9+; if not, set the
147+
`CC` and `CXX` environment variables to point to
148+
valid compilers. On NERSC Perlmutter,
149+
`module load gcc` will set up your environment
150+
correctly.
151+
152+
To install, run
153+
```bash
154+
pip install git+https://github.com/PASSIONLab/OpenEquivariance
155+
```
156+
After installation, the very first library
157+
import will trigger a build of a C++ extension we use,
158+
which takes longer than usual.
159+
All subsequent imports will not retrigger compilation.
160+
161+
## Replicating our benchmarks
162+
To run our benchmark suite, you'll also need the following packages:
163+
- `e3nn`,
164+
- `cuEquivariance`
165+
- `cuEquivariance-torch`
166+
- `cuEquivariance-ops-torch-cu11` OR `cuEquivariance-ops-torch-cu12`
167+
- `matplotlib` (to reproduce our figures)
168+
169+
You can get all the necessary dependencies via our optional dependencies `[bench]`
170+
171+
```bash
172+
pip install "git+https://github.com/PASSIONLab/OpenEquivariance[bench]"
173+
```
174+
175+
We conducted our benchmarks on an NVIDIA A100-SXM-80GB GPU at
176+
Lawrence Berkeley National Laboratory. Your results may differ
177+
a different GPU.
178+
179+
The file `tests/benchmark.py` can reproduce the figures in
180+
our paper an A100-SXM4-80GB GPU.
181+
Run it with the following invocations:
182+
```bash
183+
python tests/benchmark.py -o outputs/uvu uvu --plot
184+
python tests/benchmark.py -o outputs/uvw uvw --plot
185+
python tests/benchmark.py -o outputs/roofline roofline --plot
186+
python tests/benchmark.py -o outputs/conv conv --plot --data data/molecular_structures
187+
python tests/benchmark.py -o outputs/kahan_conv kahan_conv --data data/molecular_structures/
188+
```
189+
190+
If your GPU has limited memory, you might want to try
191+
the `--limited-memory` flag to disable some expensive
192+
tests and / or reduce the batch size with `-b`. Run
193+
`python tests/benchmark.py --help` for a full list of flags.
194+
195+
Here's a set
196+
of invocations for an A5000 GPU:
197+
198+
```bash
199+
python tests/benchmark.py -o outputs/uvu uvu --limited-memory --plot
200+
python tests/benchmark.py -o outputs/uvw uvw -b 25000 --plot
201+
python tests/benchmark.py -o outputs/roofline roofline --plot
202+
python tests/benchmark.py -o outputs/conv conv --data data/molecular_structures --limited-memory
203+
```
204+
Note that for GPUs besides the one we used in our
205+
testing, the roofline slope / peak will be incorrect, and your results
206+
may differ from the ones we've reported. The plots for the convolution fusion
207+
experiments also require a GPU with a minimum of 40GB of memory.
208+
209+
## Testing Correctness
210+
See the `dev` dependencies in `pyproject.toml`; you'll need `e3nn`,
211+
`pytest`, `torch_geometric`, and `pytest-check` installed. You can test batch
212+
tensor products and fused convolution tensor products as follows:
213+
```bash
214+
pytest tests/batch_test.py
215+
pytest tests/conv_test.py
216+
```
217+
Browse the file to select specific tests.
218+
219+
## Compilation with JITScript, Export, and AOTInductor
220+
OpenEquivariance supports model compilation with
221+
`torch.compile`, JITScript, `torch.export`, and AOTInductor.
222+
Demo the C++ model exports with
223+
```bash
224+
pytest tests/export_test.py
225+
```
226+
NOTE: the AOTInductor test (and possibly export) fail
227+
unless you are using a Nightly
228+
build of PyTorch past 4/10/2025 due to incomplete support for
229+
TorchBind in earlier versions.
230+
231+
## Running MACE
232+
**NOTE**: If you're revisiting this page, the repo containing
233+
our up-to-date MACE integration has changed! See the instructions
234+
below; we use a branch off a fork of MACE to facilitate
235+
PRs into the main codebase.
236+
237+
We have modified MACE to use our accelerated kernels instead
238+
of the standard e3nn backend. Here are the steps to replicate
239+
our MACE benchmark:
240+
241+
1. Install `oeq` and our modified version of MACE:
242+
```bash
243+
pip uninstall mace-torch
244+
pip install git+https://github.com/PASSIONLab/OpenEquivariance
245+
pip install git+https://github.com/vbharadwaj-bk/mace_oeq_integration.git@oeq_experimental
246+
```
247+
248+
2. Download the `carbon.xyz` data file, available at <https://portal.nersc.gov/project/m1982/equivariant_nn_graphs/>.
249+
This graph has 158K edges. With the original e3nn backend, you would need a GPU with 80GB
250+
of memory to run the experiments. `oeq` provides a memory-efficient equivariant convolution, so we expect
251+
the test to succeed.
252+
253+
3. Benchmark OpenEquivariance:
254+
```bash
255+
python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i oeq
256+
```
257+
258+
4. If you have a GPU with 80GB of memory OR supply a smaller molecular graph
259+
as the input file, you can run the full benchmark that includes `e3nn` and `cue`:
260+
```bash
261+
python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i e3nn cue oeq
262+
```
263+
264+
## Tensor products we accelerate
265+
266+
| Operation | CUDA | HIP |
267+
|--------------------------|----------|-----|
268+
| UVU |||
269+
| UVW |||
270+
| UVU + Convolution |||
271+
| UVW + Convolution |||
272+
| Symmetric Tensor Product | ✅ (beta) | ✅ (beta) |
273+
274+
e3nn supports a variety of connection modes for CG tensor products. We support
275+
two that are commonly used in equivariant graph neural networks:
276+
"uvu" and "uvw". Our JIT compiled kernels should handle:
277+
278+
1. Pure "uvu" tensor products, which are most efficient when the input with higher
279+
multiplicities is the first argument. Our results are identical to e3nn when irreps in
280+
the second input have multiplicity 1, and otherwise identical up to a reordering
281+
of the input weights.
282+
283+
2. Pure "uvw" tensor products, which are currently more efficient when the input with
284+
higher multiplicities is the first argument. Our results are identical to e3nn up to a reordering
285+
of the input weights.
286+
287+
Our code includes correctness checks, but the configuration space is large. If you notice
288+
a bug, let us know in a Github issue. We'll try our best to correct it or document the problem here.
289+
290+
We do not (yet) support:
291+
292+
- Mixing different instruction types in the same tensor product.
293+
- Instruction types besides "uvu" and "uvw".
294+
- Non-trainable instructions: all of your instructions must have weights associated.
295+
296+
If you have a use case for any of the unsupported features above, let us know.
297+
298+
We have recently added beta support for symmetric
299+
contraction acceleration. Because this is a kernel
300+
specific to MACE, we require e3nn as dependency
301+
to run it, and there is currently no support for
302+
compile / export (coming soon!), we
303+
do not expose it in the package
304+
toplevel. You can test out our implementation by
305+
running
306+
307+
```python
308+
from openequivariance.implementations.symmetric_contraction import SymmetricContraction as OEQSymmetricContraction
309+
```
310+
311+
## Multidevice / Stream Support
312+
To use OpenEquivariance on multiple GPUs of a single
313+
compute node, we currently require that all GPUs
314+
share the same compute capability. This is because
315+
our kernels are compiled based on the shared memory
316+
capacity of the numerically first visible GPU card.
317+
On heterogeneous systems, you can still
318+
use OpenEquivariance on all GPUs that match the
319+
compute capability of the first visible device.
320+
321+
We are working on support for CUDA streams!
322+
131323
## Citation and Acknowledgements
132324
If you find this code useful, please cite our paper:
133325

@@ -137,7 +329,9 @@ author={Vivek Bharadwaj and Austin Glover and Aydin Buluc and James Demmel},
137329
title={An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks},
138330
booktitle = {SIAM Conference on Applied and Computational Discrete Algorithms (ACDA25)},
139331
chapter = {},
140-
url={https://arxiv.org/abs/2501.13986}
332+
url={https://arxiv.org/abs/2501.13986},
333+
publisher={Society for Industrial and Applied Mathematics},
334+
year={2025}
141335
}
142336
```
143337

@@ -154,4 +348,4 @@ Copyright (c) 2025, The Regents of the University of California, through Lawrenc
154348

155349
If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.
156350

157-
NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.
351+
NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

0 commit comments

Comments
 (0)