Shifted README contents to documentation, fixed doc build error. (#114)

vbharadwaj-bk · web-flow · commit 27b4d4c97595 · 2025-06-05T01:26:51.000-07:00
* Shifted README contents to documentation.

* Changed README back.
diff --git a/README.md b/README.md
@@ -2,8 +2,7 @@
 [![OEQ CUDA C++ Extension Build Verification](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml/badge.svg?event=push)](https://github.com/PASSIONLab/OpenEquivariance/actions/workflows/verify_extension_build.yml)
 [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
 
-[[Examples]](#show-me-some-examples) [[Installation]](#installation)
-[[Supported Tensor Products]](#tensor-products-we-accelerate)
+[[Examples]](#show-me-some-examples) 
 [[Citation and Acknowledgements]](#citation-and-acknowledgements)
 
 OpenEquivariance is a CUDA and HIP kernel generator for the Clebsch-Gordon tensor product, 
@@ -27,9 +26,8 @@ We also offer fused equivariant graph
 convolutions that can reduce 
 computation and memory consumption significantly. 
 
-We currently support NVIDIA GPUs and just added beta support on AMD GPUs for
-all tensor products! See [the coverage table](#tensor-products-we-accelerate) for more 
-details.
+For detailed instructions on tests, benchmarks, MACE / Nequip, and our API,
+check out the [documentation](https://passionlab.github.io/OpenEquivariance).
 
 📣 📣 OpenEquivariance was accepted to the 2025 SIAM Conference on Applied and 
 Computational Discrete Algorithms (Proceedings Track)! Catch the talk in 
@@ -129,197 +127,6 @@ print(torch.norm(Z))
 `deterministic=False`, the `sender` and `receiver` indices can have
 arbitrary order.
 
-**New:** If you're working in FP32 precision and want
-higher accuracy during graph convolution, we offer a Kahan 
-summation variant of our deterministic algorithm:
-
-```python
-tp_conv_kahan = oeq.TensorProductConv(problem, torch_op=True, deterministic=True, kahan=True) 
-Z = tp_conv_kahan.forward(X, Y[receiver_perm], W[receiver_perm], edge_index[0], edge_index[1], sender_perm) 
-print(torch.norm(Z))
-```
-
-## Installation 
-We currently support Linux systems only. 
-Before installation and the first library import, 
-ensure that the command 
-`c++ --version` returns GCC 9+; if not, set the
-`CC` and `CXX` environment variables to point to
-valid compilers. On NERSC Perlmutter,
-`module load gcc` will set up your environment
-correctly. 
-
-To install, run
-```bash
-pip install git+https://github.com/PASSIONLab/OpenEquivariance
-```
-After installation, the very first library
-import will trigger a build of a C++ extension we use,
-which takes longer than usual.
-All subsequent imports will not retrigger compilation.
-
-## Replicating our benchmarks 
-To run our benchmark suite, you'll also need the following packages: 
-- `e3nn`, 
-- `cuEquivariance`
-- `cuEquivariance-torch` 
-- `cuEquivariance-ops-torch-cu11` OR `cuEquivariance-ops-torch-cu12` 
-- `matplotlib` (to reproduce our figures) 
-
-You can get all the necessary dependencies via our optional dependencies `[bench]`
-
-```bash
-pip install "git+https://github.com/PASSIONLab/OpenEquivariance[bench]"
-```
-
-We conducted our benchmarks on an NVIDIA A100-SXM-80GB GPU at
-Lawrence Berkeley National Laboratory. Your results may differ 
-a different GPU.
-
-The file `tests/benchmark.py` can reproduce the figures in 
-our paper an A100-SXM4-80GB GPU. 
-Run it with the following invocations: 
-```bash
-python tests/benchmark.py -o outputs/uvu uvu --plot
-python tests/benchmark.py -o outputs/uvw uvw --plot
-python tests/benchmark.py -o outputs/roofline roofline --plot
-python tests/benchmark.py -o outputs/conv conv --plot --data data/molecular_structures
-python tests/benchmark.py -o outputs/kahan_conv kahan_conv --data data/molecular_structures/
-```
-
-If your GPU has limited memory, you might want to try
-the `--limited-memory` flag to disable some expensive
-tests and / or reduce the batch size with `-b`. Run
-`python tests/benchmark.py --help` for a full list of flags.
-
-Here's a set
-of invocations for an A5000 GPU:
-
-```bash
-python tests/benchmark.py -o outputs/uvu uvu --limited-memory --plot
-python tests/benchmark.py -o outputs/uvw uvw -b 25000 --plot
-python tests/benchmark.py -o outputs/roofline roofline --plot
-python tests/benchmark.py -o outputs/conv conv --data data/molecular_structures --limited-memory
-```
-Note that for GPUs besides the one we used in our 
-testing, the roofline slope / peak will be incorrect, and your results
-may differ from the ones we've reported. The plots for the convolution fusion
-experiments also require a GPU with a minimum of 40GB of memory. 
-
-## Testing Correctness
-See the `dev` dependencies in `pyproject.toml`; you'll need `e3nn`,
-`pytest`, `torch_geometric`, and `pytest-check` installed. You can test batch 
-tensor products and fused convolution tensor products as follows:
-```bash
-pytest tests/batch_test.py 
-pytest tests/conv_test.py 
-```
-Browse the file to select specific tests.
-
-## Compilation with JITScript, Export, and AOTInductor 
-OpenEquivariance supports model compilation with
-`torch.compile`, JITScript, `torch.export`, and AOTInductor. 
-Demo the C++ model exports with
-```bash
-pytest tests/export_test.py 
-```
-NOTE: the AOTInductor test (and possibly export) fail 
-unless you are using a Nightly
-build of PyTorch past 4/10/2025 due to incomplete support for 
-TorchBind in earlier versions.
-
-## Running MACE
-**NOTE**: If you're revisiting this page, the repo containing
-our up-to-date MACE integration has changed! See the instructions
-below; we use a branch off a fork of MACE to facilitate
-PRs into the main codebase.
-
-We have modified MACE to use our accelerated kernels instead
-of the standard e3nn backend. Here are the steps to replicate
-our MACE benchmark:
-
-1. Install `oeq` and our modified version of MACE:
-```bash
-pip uninstall mace-torch
-pip install git+https://github.com/PASSIONLab/OpenEquivariance
-pip install git+https://github.com/vbharadwaj-bk/mace_oeq_integration.git@oeq_experimental
-```
-
-2. Download the `carbon.xyz` data file, available at <https://portal.nersc.gov/project/m1982/equivariant_nn_graphs/>. 
-   This graph has 158K edges. With the original e3nn backend, you would need a GPU with 80GB
-   of memory to run the experiments. `oeq` provides a memory-efficient equivariant convolution, so we expect
-   the test to succeed.
-
-3. Benchmark OpenEquivariance: 
-```bash
-python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i oeq
-```
-
-4. If you have a GPU with 80GB of memory OR supply a smaller molecular graph
-   as the input file, you can run the full benchmark that includes `e3nn` and `cue`: 
-```bash
-python tests/mace_driver.py carbon.xyz -o outputs/mace_tests -i e3nn cue oeq
-```
-
-## Tensor products we accelerate
-
-| Operation                | CUDA     | HIP |
-|--------------------------|----------|-----|
-| UVU                      | ✅        | ✅    |
-| UVW                      | ✅        | ✅    |
-| UVU + Convolution        | ✅        | ✅    |
-| UVW + Convolution        | ✅        | ✅    |
-| Symmetric Tensor Product | ✅ (beta) | ✅ (beta)  |
-
-e3nn supports a variety of connection modes for CG tensor products. We support 
-two that are commonly used in equivariant graph neural networks:
-"uvu" and "uvw". Our JIT compiled kernels should handle:
-
-1. Pure "uvu" tensor products, which are most efficient when the input with higher
-multiplicities is the first argument. Our results are identical to e3nn when irreps in
-the second input have multiplicity 1, and otherwise identical up to a reordering
-of the input weights.
-
-2. Pure "uvw" tensor products, which are currently more efficient when the input with
-higher multiplicities is the first argument. Our results are identical to e3nn up to a reordering
-of the input weights. 
-
-Our code includes correctness checks, but the configuration space is large. If you notice
-a bug, let us know in a Github issue. We'll try our best to correct it or document the problem here.
-
-We do not (yet) support:
-
-- Mixing different instruction types in the same tensor product. 
-- Instruction types besides "uvu" and "uvw".
-- Non-trainable instructions: all of your instructions must have weights associated. 
-
-If you have a use case for any of the unsupported features above, let us know.
-
-We have recently added beta support for symmetric
-contraction acceleration. Because this is a kernel 
-specific to MACE, we require e3nn as dependency
-to run it, and there is currently no support for
-compile / export (coming soon!), we 
-do not expose it in the package
-toplevel. You can test out our implementation by
-running 
-
-```python
-from openequivariance.implementations.symmetric_contraction import SymmetricContraction as OEQSymmetricContraction
-```
-
-## Multidevice / Stream Support
-To use OpenEquivariance on multiple GPUs of a single
-compute node, we currently require that all GPUs 
-share the same compute capability. This is because
-our kernels are compiled based on the shared memory
-capacity of the numerically first visible GPU card. 
-On heterogeneous systems, you can still 
-use OpenEquivariance on all GPUs that match the
-compute capability of the first visible device.
-
-We are working on support for CUDA streams!
-
 ## Citation and Acknowledgements
 If you find this code useful, please cite our paper:
 
diff --git a/docs/conf.py b/docs/conf.py
@@ -34,5 +34,5 @@
 
 sys.path.insert(0, str(Path("..").resolve()))
 
-autodoc_mock_imports = ["torch", "openequivariance.extlib", "jinja2"]
+autodoc_mock_imports = ["torch", "openequivariance.extlib", "jinja2", "numpy"]
 autodoc_typehints = "description"