22[ ![ OEQ CUDA C++ Extension Build Verification] ( https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml/badge.svg?event=push )] ( https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml )
33[ ![ License] ( https://img.shields.io/badge/License-BSD_3--Clause-blue.svg )] ( https://opensource.org/licenses/BSD-3-Clause )
44
5- [[ Examples]] ( #show-me-some-examples ) [[ Installation]] ( #installation )
6- [[ Supported Tensor Products]] ( #tensor-products-we-accelerate )
5+ [[ Examples]] ( #show-me-some-examples )
76[[ Citation and Acknowledgements]] ( #citation-and-acknowledgements )
87
98OpenEquivariance is a CUDA and HIP kernel generator for the Clebsch-Gordon tensor product,
@@ -27,9 +26,8 @@ We also offer fused equivariant graph
2726convolutions that can reduce
2827computation and memory consumption significantly.
2928
30- We currently support NVIDIA GPUs and just added beta support on AMD GPUs for
31- all tensor products! See [ the coverage table] ( #tensor-products-we-accelerate ) for more
32- details.
29+ For detailed instructions on tests, benchmarks, MACE / Nequip, and our API,
30+ check out the [ documentation] ( https://passionlab.github.io/OpenEquivariance ) .
3331
3432📣 📣 OpenEquivariance was accepted to the 2025 SIAM Conference on Applied and
3533Computational Discrete Algorithms (Proceedings Track)! Catch the talk in
@@ -129,197 +127,6 @@ print(torch.norm(Z))
129127` deterministic=False ` , the ` sender ` and ` receiver ` indices can have
130128arbitrary order.
131129
132- ** New:** If you're working in FP32 precision and want
133- higher accuracy during graph convolution, we offer a Kahan
134- summation variant of our deterministic algorithm:
135-
136- ``` python
137- tp_conv_kahan = oeq.TensorProductConv(problem, torch_op = True , deterministic = True , kahan = True )
138- Z = tp_conv_kahan.forward(X, Y[receiver_perm], W[receiver_perm], edge_index[0 ], edge_index[1 ], sender_perm)
139- print (torch.norm(Z))
140- ```
141-
142- ## Installation
143- We currently support Linux systems only.
144- Before installation and the first library import,
145- ensure that the command
146- ` c++ --version ` returns GCC 9+; if not, set the
147- ` CC ` and ` CXX ` environment variables to point to
148- valid compilers. On NERSC Perlmutter,
149- ` module load gcc ` will set up your environment
150- correctly.
151-
152- To install, run
153- ``` bash
154- pip install git+https://github.com/PASSIONLab/OpenEquivariance
155- ```
156- After installation, the very first library
157- import will trigger a build of a C++ extension we use,
158- which takes longer than usual.
159- All subsequent imports will not retrigger compilation.
160-
161- ## Replicating our benchmarks
162- To run our benchmark suite, you'll also need the following packages:
163- - ` e3nn ` ,
164- - ` cuEquivariance `
165- - ` cuEquivariance-torch `
166- - ` cuEquivariance-ops-torch-cu11 ` OR ` cuEquivariance-ops-torch-cu12 `
167- - ` matplotlib ` (to reproduce our figures)
168-
169- You can get all the necessary dependencies via our optional dependencies ` [bench] `
170-
171- ``` bash
172- pip install " git+https://github.com/PASSIONLab/OpenEquivariance[bench]"
173- ```
174-
175- We conducted our benchmarks on an NVIDIA A100-SXM-80GB GPU at
176- Lawrence Berkeley National Laboratory. Your results may differ
177- a different GPU.
178-
179- The file ` tests/benchmark.py ` can reproduce the figures in
180- our paper an A100-SXM4-80GB GPU.
181- Run it with the following invocations:
182- ``` bash
183- python tests/benchmark.py -o outputs/uvu uvu --plot
184- python tests/benchmark.py -o outputs/uvw uvw --plot
185- python tests/benchmark.py -o outputs/roofline roofline --plot
186- python tests/benchmark.py -o outputs/conv conv --plot --data data/molecular_structures
187- python tests/benchmark.py -o outputs/kahan_conv kahan_conv --data data/molecular_structures/
188- ```
189-
190- If your GPU has limited memory, you might want to try
191- the ` --limited-memory ` flag to disable some expensive
192- tests and / or reduce the batch size with ` -b ` . Run
193- ` python tests/benchmark.py --help ` for a full list of flags.
194-
195- Here's a set
196- of invocations for an A5000 GPU:
197-
198- ``` bash
199- python tests/benchmark.py -o outputs/uvu uvu --limited-memory --plot
200- python tests/benchmark.py -o outputs/uvw uvw -b 25000 --plot
201- python tests/benchmark.py -o outputs/roofline roofline --plot
202- python tests/benchmark.py -o outputs/conv conv --data data/molecular_structures --limited-memory
203- ```
204- Note that for GPUs besides the one we used in our
205- testing, the roofline slope / peak will be incorrect, and your results
206- may differ from the ones we've reported. The plots for the convolution fusion
207- experiments also require a GPU with a minimum of 40GB of memory.
208-
209- ## Testing Correctness
210- See the ` dev ` dependencies in ` pyproject.toml ` ; you'll need ` e3nn ` ,
211- ` pytest ` , ` torch_geometric ` , and ` pytest-check ` installed. You can test batch
212- tensor products and fused convolution tensor products as follows:
213- ``` bash
214- pytest tests/batch_test.py
215- pytest tests/conv_test.py
216- ```
217- Browse the file to select specific tests.
218-
219- ## Compilation with JITScript, Export, and AOTInductor
220- OpenEquivariance supports model compilation with
221- ` torch.compile ` , JITScript, ` torch.export ` , and AOTInductor.
222- Demo the C++ model exports with
223- ``` bash
224- pytest tests/export_test.py
225- ```
226- NOTE: the AOTInductor test (and possibly export) fail
227- unless you are using a Nightly
228- build of PyTorch past 4/10/2025 due to incomplete support for
229- TorchBind in earlier versions.
230-
231- ## Running MACE
232- ** NOTE** : If you're revisiting this page, the repo containing
233- our up-to-date MACE integration has changed! See the instructions
234- below; we use a branch off a fork of MACE to facilitate
235- PRs into the main codebase.
236-
237- We have modified MACE to use our accelerated kernels instead
238- of the standard e3nn backend. Here are the steps to replicate
239- our MACE benchmark:
240-
241- 1 . Install ` oeq ` and our modified version of MACE:
242- ``` bash
243- pip uninstall mace-torch
244- pip install git+https://github.com/PASSIONLab/OpenEquivariance
245- pip install git+https://github.com/vbharadwaj-bk/mace_oeq_integration.git@oeq_experimental
246- ```
247-
248- 2 . Download the ` carbon.xyz ` data file, available at < https://portal.nersc.gov/project/m1982/equivariant_nn_graphs/ > .
249- This graph has 158K edges. With the original e3nn backend, you would need a GPU with 80GB
250- of memory to run the experiments. ` oeq ` provides a memory-efficient equivariant convolution, so we expect
251- the test to succeed.
252-
253- 3 . Benchmark OpenEquivariance:
254- ``` bash
255- python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i oeq
256- ```
257-
258- 4 . If you have a GPU with 80GB of memory OR supply a smaller molecular graph
259- as the input file, you can run the full benchmark that includes ` e3nn ` and ` cue ` :
260- ``` bash
261- python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i e3nn cue oeq
262- ```
263-
264- ## Tensor products we accelerate
265-
266- | Operation | CUDA | HIP |
267- | --------------------------| ----------| -----|
268- | UVU | ✅ | ✅ |
269- | UVW | ✅ | ✅ |
270- | UVU + Convolution | ✅ | ✅ |
271- | UVW + Convolution | ✅ | ✅ |
272- | Symmetric Tensor Product | ✅ (beta) | ✅ (beta) |
273-
274- e3nn supports a variety of connection modes for CG tensor products. We support
275- two that are commonly used in equivariant graph neural networks:
276- "uvu" and "uvw". Our JIT compiled kernels should handle:
277-
278- 1 . Pure "uvu" tensor products, which are most efficient when the input with higher
279- multiplicities is the first argument. Our results are identical to e3nn when irreps in
280- the second input have multiplicity 1, and otherwise identical up to a reordering
281- of the input weights.
282-
283- 2 . Pure "uvw" tensor products, which are currently more efficient when the input with
284- higher multiplicities is the first argument. Our results are identical to e3nn up to a reordering
285- of the input weights.
286-
287- Our code includes correctness checks, but the configuration space is large. If you notice
288- a bug, let us know in a Github issue. We'll try our best to correct it or document the problem here.
289-
290- We do not (yet) support:
291-
292- - Mixing different instruction types in the same tensor product.
293- - Instruction types besides "uvu" and "uvw".
294- - Non-trainable instructions: all of your instructions must have weights associated.
295-
296- If you have a use case for any of the unsupported features above, let us know.
297-
298- We have recently added beta support for symmetric
299- contraction acceleration. Because this is a kernel
300- specific to MACE, we require e3nn as dependency
301- to run it, and there is currently no support for
302- compile / export (coming soon!), we
303- do not expose it in the package
304- toplevel. You can test out our implementation by
305- running
306-
307- ``` python
308- from openequivariance.implementations.symmetric_contraction import SymmetricContraction as OEQSymmetricContraction
309- ```
310-
311- ## Multidevice / Stream Support
312- To use OpenEquivariance on multiple GPUs of a single
313- compute node, we currently require that all GPUs
314- share the same compute capability. This is because
315- our kernels are compiled based on the shared memory
316- capacity of the numerically first visible GPU card.
317- On heterogeneous systems, you can still
318- use OpenEquivariance on all GPUs that match the
319- compute capability of the first visible device.
320-
321- We are working on support for CUDA streams!
322-
323130## Citation and Acknowledgements
324131If you find this code useful, please cite our paper:
325132
0 commit comments