Documentation improvements (#502)

marcinz · manopapad · manopapad · commit eedb7e1fc6fa · 2024-11-16T18:21:51.000-05:00
* Initial pass

* todos.rst

* Address comments

* Fix warnings

* Update product positioning

* Add supported platform info

* Move all Jupyter instructions to Legate

* more warnings

* Remove todos

---------

Co-authored-by: Manolis Papadakis &lt;mpapadakis@nvidia.com&gt;
diff --git a/cupynumeric/_array/array.py b/cupynumeric/_array/array.py
@@ -416,7 +416,7 @@ def flat(self) -> np.flatiter[npt.NDArray[Any]]:
         flatten : Return a copy of the array collapsed into one dimension.
 
         Availability
-        --------
+        ------------
         Single CPU
 
         """
diff --git a/cupynumeric/_ufunc/ufunc.py b/cupynumeric/_ufunc/ufunc.py
@@ -79,7 +79,7 @@
 numpy.{}
 
 Availability
---------
+------------
 Multiple GPUs, Multiple CPUs
 """
 
@@ -117,7 +117,7 @@
 numpy.{}
 
 Availability
---------
+------------
 Multiple GPUs, Multiple CPUs
 """
 
@@ -155,7 +155,7 @@
 numpy.{}
 
 Availability
---------
+------------
 Multiple GPUs, Multiple CPUs
 """
 
diff --git a/cupynumeric/fft/fft.py b/cupynumeric/fft/fft.py
@@ -104,7 +104,7 @@ def fft(
     numpy.fft.fft
 
     Availability
-    --------
+    ------------
     Multiple GPUs
     """
     s = (n,) if n is not None else None
diff --git a/cupynumeric/random/_bitgenerator.py b/cupynumeric/random/_bitgenerator.py
@@ -53,7 +53,7 @@ def __init__(
         numpy.random.BitGenerator
 
         Availability
-        --------
+        ------------
         Multiple GPUs, Multiple CPUs
         """
         if type(self) is BitGenerator:
diff --git a/cupynumeric/random/_generator.py b/cupynumeric/random/_generator.py
@@ -57,7 +57,7 @@ def __init__(self, bit_generator: BitGenerator) -> None:
         default_rng : Recommended constructor for `Generator`.
 
         Availability
-        --------
+        ------------
         Multiple GPUs, Multiple CPUs
 
         """
diff --git a/docs/cupynumeric/source/api/comparison.rst b/docs/cupynumeric/source/api/comparison.rst
@@ -7,6 +7,6 @@ A dot in the cupynumeric column denotes that cuPyNumeric implementation
 is not provided yet. We welcome contributions for these functions.
 
 NumPy vs cuPyNumeric APIs
------------------------
+-------------------------
 
 .. comparison-table::
diff --git a/docs/cupynumeric/source/examples/torchswe.ipynb b/docs/cupynumeric/source/examples/torchswe.ipynb
@@ -5,7 +5,7 @@
    "id": "5be6c57b-7cae-4fc1-b78f-899becabc6ee",
    "metadata": {},
    "source": [
-    "<h1>TorchSWE case study</h1>\n",
+    "# TorchSWE case study\n",
     "\n",
     "\n",
     "[TorchSWE](https://github.com/piyueh/TorchSWE) is a shallow-water solver created by Dr. Pi-Yueh Chuang and Prof. Lorena Barba that solves the vertically averaged Navier-Stokes equations using MPI and CuPy. It can simulate free-surface water flow in rivers, channels, and coastal areas, as well as model flood inundation. Given a topography, TorchSWE can predict flood-prone areas and the height of water inundation, making it a valuable tool for risk mapping.\n",
diff --git a/docs/cupynumeric/source/faqs.rst b/docs/cupynumeric/source/faqs.rst
@@ -11,7 +11,7 @@ Legate offers three different task variants: CPU, OMP, and GPU. A task variant
 determines the type of processor Legate chooses to perform the computations.
 
 What is the difference between Legate and cuPyNumeric?
-----------------------------------------------------
+------------------------------------------------------
 
 Legate is a task-based runtime software stack that enables development of
 scalable and composable libraries for distributed and accelerated computing.
@@ -101,14 +101,13 @@ How to handle Out-Of-Memory errors?
 
 .. code-block:: text
 
-    [0 - 7fb9fc426000]    0.985000 {5}{cupynumeric.mapper}: Mapper cupynumeric on Node 0 failed to allocate 144000000 bytes on memory 1e00000000000000 (of kind SYSTEM_MEM: Visible to all processors on a node) for region requirement 1 of Task cupynumeric::WhereTask[./script.py:90] (UID 39).
+    [0 - 7fda18f26000]    0.805182 {5}{cunumeric.mapper}: Failed to allocate 8388608 bytes on memory 1e00000000000000 (of kind SYSTEM_MEM) for region requirement(s) 1 of Task cupynumeric::BinaryOpTask[oom.py:24] (UID 18)
 
 The above error indicates that the application ran out of memory during
 execution. More granular details on the type of memory, the task that triggered
-the error are provided in the error message, but this usually indicates that
-resources (add more cores/threads/ GPUs, or increase the amount of system
-memory or framebuffer memory) or decrease the problem size and confirm that you
-are able to run the program to completion.
+the error, and what was using up the available memory are provided in the error
+message. If possible, try increasing the amount of system memory or framebuffer
+memory allocated to the program, or decrease the problem size.
 
 Reducing the ``--eager-alloc-percentage`` to, say, 10 or less can also help
 since this reduces the amount of available memory available to the eager memory
@@ -151,7 +150,7 @@ Check out the :ref:`benchmarking` section for information on how to accurately
 measure cuPyNumeric execution.
 
 Why is cuPyNumeric slower than NumPy on my laptop?
-------------------------------------------------
+--------------------------------------------------
 
 For small problem sizes, cuPyNumeric might be slower than NumPy. We suggest you
 increase the problem size and correspondingly increase the resources needed
@@ -169,7 +168,7 @@ performance :ref:`practices`.
 How do I use Jupyter Notebooks?
 -------------------------------
 
-Notebooks are useful for experimentation and evaluation on a single node.
+See https://docs.nvidia.com/legate/latest/jupyter.html.
 
 How to pass Legion and Realm arguments?
 ---------------------------------------
@@ -191,19 +190,17 @@ What are the defaults?
 The default values for several input arguments to Legate are mentioned in
 Legate's documentation.
 
-Are there resources where I can read more about Legate?
--------------------------------------------------------
+Where I can read more about cuPyNumeric?
+----------------------------------------
 
 Check out this `blog post <https://developer.nvidia.com/blog/accelerating-python-applications-with-cupynumeric-and-legate/>`_
+or this `tutorial <https://github.com/NVIDIA/accelerated-computing-hub/blob/main/Accelerated_Python_User_Guide/notebooks/Chapter_X_Distributed_Computing_cuPyNumeric.ipynb>`_
 to learn more about cuPyNumeric.
 
-Technical questions?
---------------------
+Questions?
+----------
 
 For technical questions about cuPyNumeric and Legate-based tools, please visit
 the `community discussion forum <https://github.com/nv-legate/discussion>`_.
 
-Other questions?
-----------------
-
-Follow us on `GitHub <https://github.com/nv-legate>`_ or reach out to us there.
+If you have other questions, please contact us at *legate@nvidia.com*.
diff --git a/docs/cupynumeric/source/index.rst b/docs/cupynumeric/source/index.rst
@@ -1,15 +1,15 @@
 :html_theme.sidebar_secondary.remove:
 
 NVIDIA cuPyNumeric
-================
+==================
 
-cuPyNumeric is a `Legate`_ library that aims to provide a distributed and
-accelerated drop-in replacement for the `NumPy API`_ on top of the `Legion`_
-runtime.
+With cuPyNumeric you can write code productively in Python, using the familiar
+`NumPy API`_, and have your program scale with no code changes from single-CPU
+computers to multi-node-multi-GPU clusters.
 
-Using cuPyNumeric you do things like run the final example of the
-`Python CFD course`_ completely unmodified on 2048 A100 GPUs in a
-`DGX SuperPOD`_ and achieve good weak scaling.
+For example, you can run the final example of the `Python CFD course`_
+completely unmodified on 2048 A100 GPUs in a `DGX SuperPOD`_ and achieve
+good weak scaling.
 
 .. toctree::
   :maxdepth: 1
@@ -30,7 +30,5 @@ Indices and tables
 * :ref:`search`
 
 .. _DGX SuperPOD: https://www.nvidia.com/en-us/data-center/dgx-superpod/
-.. _Legate: https://github.com/nv-legate/legate.core
-.. _Legion: https://legion.stanford.edu/
 .. _Numpy API: https://numpy.org/doc/stable/reference/
 .. _Python CFD course: https://github.com/barbagroup/CFDPython/blob/master/lessons/15_Step_12.ipynb
diff --git a/docs/cupynumeric/source/installation.rst b/docs/cupynumeric/source/installation.rst
@@ -4,6 +4,9 @@ Installation
 Default conda install
 ---------------------
 
+cuPyNumeric supports the
+`same platforms as Legate <https://docs.nvidia.com/legate/latest/installation.html#support-matrix>`_.
+
 cuPyNumeric is available from
 `conda <https://docs.conda.io/projects/conda/en/latest/index.html>`_
 on the `legate channel <https://anaconda.org/legate/cupynumeric>`_.
@@ -33,7 +36,9 @@ environment, use environment variable ``CONDA_OVERRIDE_CUDA``:
       conda install -c conda-forge -c legate cupynumeric
 
 Once installed, you can verify the installation by running one of the examples
-from the cuPyNumeric repository, for instance:
+from the
+`cuPyNumeric repository <https://github.com/nv-legate/cunumeric/tree/HEAD/examples>`_,
+for instance:
 
 .. code-block:: sh
 
diff --git a/docs/cupynumeric/source/user/howtos/index.rst b/docs/cupynumeric/source/user/howtos/index.rst
@@ -6,5 +6,4 @@ Howtos
 
   measuring
   benchmarking
-  jupyter
   patching
diff --git a/docs/cupynumeric/source/user/howtos/jupyter.rst b/docs/cupynumeric/source/user/howtos/jupyter.rst
diff --git a/docs/cupynumeric/source/user/practices.rst b/docs/cupynumeric/source/user/practices.rst
@@ -17,10 +17,10 @@ etc.) is noted in the docstring of the API. This would be useful to know while
 designing the application since it can impact the scalability.
 
 Guidelines on using cuPyNumeric APIs
-----------------------------------
+------------------------------------
 
 Use cuPyNumeric or NumPy arrays, AVOID native lists
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Create a cuPyNumeric array from data structures native to Python like lists,
 tuples, etc., and operate on the cuPyNumeric array, as shown in the example
@@ -232,7 +232,7 @@ Faster I/O Routines
 As of 23.07, we recommend using `h5py <https://github.com/h5py/h5py>`_ to perform I/O.
 
 Guidelines on designing cuPyNumeric applications
-----------------------------------------------
+------------------------------------------------
 
 Use output arguments to reduce memory allocation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~