Add resource management docstring

cehongwang · cehongwang · commit b9b6aebf9762 · 2025-10-29T17:11:13.000Z
diff --git a/docsrc/contributors/resource_management.rst b/docsrc/contributors/resource_management.rst
@@ -0,0 +1,88 @@
+.. _resource_management:
+
+Resource Management
+===================
+
+Overview
+--------
+
+Efficient control of CPU and GPU memory is essential for successful model compilation, 
+especially when working with large models such as LLMs or diffusion models. 
+Uncontrolled memory growth can cause compilation failures or process termination. 
+This guide describes the symptoms of excessive memory usage and provides methods 
+to reduce both CPU and GPU memory consumption.
+
+Memory Usage Control
+--------------------
+
+CPU Memory
+^^^^^^^^^^
+
+By default, Torch-TensorRT may consume up to **5×** the model size in CPU memory.  
+This can exceed system limits when compiling large models.
+
+**Common symptoms of high CPU memory usage:**
+
+- Program freeze  
+- Process terminated by the operating system  
+
+**Ways to lower CPU memory usage:**
+
+1. **Enable memory trimming**
+
+   Set the following environment variable:
+
+   .. code-block:: bash
+
+      export TRIM_CPU_MEMORY=1
+
+   This reduces approximately **2×** of redundant model copies, limiting 
+   total CPU memory usage to up to **3×** the model size.
+
+2. **Disable CPU offloading**
+
+   In compilation settings, set:
+
+   .. code-block:: python
+
+      offload_module_to_cpu = False
+
+   This removes another **1×** model copy, reducing peak CPU memory 
+   usage to about **2×** the model size.
+
+GPU Memory
+^^^^^^^^^^
+
+By default, Torch-TensorRT may consume up to **2×** the model size in GPU memory.
+
+**Common symptoms of high GPU memory usage:**
+
+- CUDA out-of-memory errors  
+- TensorRT compilation errors  
+
+**Ways to lower GPU memory usage:**
+
+1. **Enable offloading to CPU**
+
+   In compilation settings, set:
+
+   .. code-block:: python
+
+      offload_module_to_cpu = True
+
+   This shifts one model copy from GPU to CPU memory.  
+   As a result, peak GPU memory usage decreases to about **1×** 
+   the model size, while CPU memory usage increases by roughly **1×**.
+
+Summary
+-------
+
+| Setting | Effect | Approx. Memory Ratio |
+|----------|---------|----------------------|
+| Default | Baseline behavior | CPU: 5×, GPU: 2× |
+| ``export TRIM_CPU_MEMORY=1`` | Reduces redundant CPU copies | CPU: ~3× |
+| ``offload_module_to_cpu=False`` | Further reduces CPU copies | CPU: ~2× |
+| ``offload_module_to_cpu=True`` | Reduces GPU usage, increases CPU usage | GPU: ~1×, CPU: +1× |
+
+Proper configuration ensures efficient resource use, stable compilation, 
+and predictable performance for large-scale models.