cai4cai
diff --git a/‎README.md‎
Lines changed: 9 additions & 5 deletions b/‎README.md‎
Lines changed: 9 additions & 5 deletions
diff --git a/‎docs/source/benchmarks.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/benchmarks.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/installation.rst‎
Lines changed: 17 additions & 8 deletions b/‎docs/source/installation.rst‎
Lines changed: 17 additions & 8 deletions
diff --git a/‎paper/jats/paper.jats‎
Lines changed: 112 additions & 79 deletions b/‎paper/jats/paper.jats‎
Lines changed: 112 additions & 79 deletions
diff --git a/‎paper/paper.bib‎
Lines changed: 1 addition & 2 deletions b/‎paper/paper.bib‎
Lines changed: 1 addition & 2 deletions
@@ -91,17 +91,21 @@ pip install git+https://github.com/cai4cai/torchsparsegradutils
 For full functionality, install optional dependencies:
 
 ```bash
-# For CuPy sparse solver support (GPU acceleration)
-pip install cupy-cuda12x  # Replace with your CUDA version
+# For CuPy sparse solver support (GPU acceleration, requires CUDA 12.x)
+pip install torchsparsegradutils[cupy]
 
 # For JAX sparse solver support
-pip install "jax[cpu]"     # CPU version
-pip install "jax[cuda12]"  # GPU version (replace with your CUDA version)
+pip install torchsparsegradutils[jax]
+
+# Install all optional dependencies
+pip install torchsparsegradutils[all]
 
 # For benchmarking and testing
 pip install scipy matplotlib pandas tqdm pytest
 ```
 
+> **Note:** The CuPy extra installs `cupy-cuda12x>=13.0`. If you are using a different CUDA version, install the appropriate CuPy package manually (e.g. `pip install cupy-cuda11x`).
+
 ### Requirements
 
 - **Python**: ≥ 3.10
@@ -118,7 +122,7 @@ Our comprehensive benchmark suite demonstrates significant performance improveme
 
 ![Sparse Triangular Solve Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/triangular_solve_suitesparse_performance_int32_float32_coo.png)
 
-![Sparse Genertic Solve Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/sparse_solve_suite_performance_int32_float32_coo.png)
+![Sparse Generic Solve Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/sparse_solve_suite_performance_int32_float32_coo.png)
 
 ## 🚀 Quick Start
 
 
@@ -341,7 +341,7 @@ Matrix: ``Rothberg/cfd2`` (123,440 × 123,440, nnz 3,085,406). Right‑hand side
 
 **Conclusions:**
 
-1. The dense PyTorch solver ``torch.linalg.solve`` fails due to out-of-memory (OOM) errors before the foward pass due to failure of creating a dense tensor which would occupy 57GB of CUDA memory.
+1. The dense PyTorch solver ``torch.linalg.solve`` fails due to out-of-memory (OOM) errors before the forward pass due to failure of creating a dense tensor which would occupy 57GB of CUDA memory.
 2. ``torch.sparse_csr`` with ``float32`` and ``int32`` indices is the most memory efficient format for both forward and backward passes.
 3. Similar to ``tsgu.sparse_mm``, the ``int32`` indices for ``torch.sparse_coo`` format uses marginally less memory than ``int64`` despite ``A.indices()`` returning ``int64`` indices.
 4. All CuPy and JAX solvers use the same amount of memory on the forward and backward pass.
 
@@ -19,11 +19,20 @@ For additional functionality, you can install optional dependencies:
 
 .. code-block:: bash
 
-   # Install with JAX and CuPy support
-   pip install torchsparsegradutils[extras]
+   # Install with CuPy support (GPU acceleration, requires CUDA 12.x)
+   pip install torchsparsegradutils[cupy]
 
-   # Or install them separately
-   pip install jax cupy
+   # Install with JAX support
+   pip install torchsparsegradutils[jax]
+
+   # Install all optional dependencies
+   pip install torchsparsegradutils[all]
+
+.. note::
+
+   The CuPy extra installs ``cupy-cuda12x>=13.0``. If you are using a different
+   CUDA version, install the appropriate CuPy package manually
+   (e.g. ``pip install cupy-cuda11x``).
 
 Requirements
 ------------
@@ -37,8 +46,8 @@ Core Requirements
 Optional Requirements
 ~~~~~~~~~~~~~~~~~~~~~
 
-- JAX (for JAX backend integration)
-- CuPy (for CuPy backend integration)
+- JAX (for JAX backend integration): ``pip install torchsparsegradutils[jax]``
+- CuPy >= 13.0 (for CuPy backend integration): ``pip install torchsparsegradutils[cupy]``
 
 Development Installation
 ------------------------
@@ -55,7 +64,7 @@ To install development dependencies:
 
 .. code-block:: bash
 
-   pip install -e .[extras]
+   pip install -e .[all]
    pip install -r requirements-ci.txt
 
 Verification
@@ -90,7 +99,7 @@ You can also use the package in a Docker container. Here's a simple Dockerfile:
 
    FROM pytorch/pytorch:latest
 
-   RUN pip install torchsparsegradutils[extras]
+   RUN pip install torchsparsegradutils[all]
 
    # Your application code
    COPY . /app
 
@@ -59,14 +59,13 @@ @inproceedings{gpytorch
   booktitle = {Advances in Neural Information Processing Systems},
   volume    = {31},
   year      = {2018},
-  doi       = {10.5555/3327757.3327857},
   url       = {https://arxiv.org/abs/1809.11165}
 }
 
 @misc{flaport2020sparse,
   title        = {Solving sparse linear systems in PyTorch},
   author       = {Laporte, Floris},
   year         = {2020},
-  howpublished = {\url{https://blog.flaport.net/solving-sparse-linear-systems-in-pytorch.html}},
+  url          = {https://blog.flaport.net/solving-sparse-linear-systems-in-pytorch.html},
   note         = {Accessed: 2025-08-22}
 }