More improvments to launcher's doc

waltforme · waltforme · commit 002a1f8d0ebc · 2026-03-09T17:05:43.000Z
Signed-off-by: Jun Duan &lt;jun.duan.phd@outlook.com&gt;
diff --git a/docs/launcher.md b/docs/launcher.md
@@ -577,15 +577,11 @@ You can set environment variables for each instance, useful for:
 
 ### Launcher Configuration
 
-#### Command-Line Parameters
+#### Command-Line Parameters and Env Vars
 
-```bash
-python launcher.py [OPTIONS]
-```
-
-**Parameters:**
-- `--mock-gpus`: Enable mock GPU mode for CPU-only environments (local dev, CI/CD, Kind clusters). Creates mock GPUs (GPU-0, GPU-1, etc.) and bypasses nvidia-ml-py.
-- `--mock-gpu-count <int>`: Number of mock GPUs to create (default: 8). Only used with `--mock-gpus` when ConfigMap discovery is unavailable.
+**Command-Line Parameters:**
+- `--mock-gpus`: Enable mock GPU mode for CPU-only environments (local dev, CI/CD, Kind clusters). Bypasses nvidia-ml-py. Creates mock GPUs either based on a 'gpu-map' ConfigMap, or by naive enumerating (GPU-0, GPU-1, etc.).
+- `--mock-gpu-count <int>`: Number of mock GPUs to create (default: 8). Only used with `--mock-gpus` but ConfigMap discovery is unavailable, thus falling back to naive enumerating of mock GPUs.
 - `--host <string>`: Bind address (default: `0.0.0.0`)
 - `--port <int>`: API port (default: `8001`)
 - `--log-level <string>`: Logging level - `critical`, `error`, `warning`, `info`, `debug` (default: `info`)
@@ -595,12 +591,11 @@ python launcher.py [OPTIONS]
 - `NAMESPACE`: Kubernetes namespace for ConfigMap lookup. Required when using ConfigMap-based GPU discovery in mock mode.
 
 **Examples:**
-
 ```bash
 # Local development (no GPUs)
 python launcher.py --mock-gpus --mock-gpu-count 2 --log-level debug
 
-# Production (real GPUs, Kubernetes injects NODE_NAME)
+# Production (real GPUs, Kubernetes injects NODE_NAME and NAMESPACE via Downward API for ConfigMap-based GPU discovery)
 python launcher.py --port 8001 --log-level info
 
 # Using uvicorn directly
diff --git a/inference_server/launcher/gputranslator.py b/inference_server/launcher/gputranslator.py
@@ -43,9 +43,9 @@ def __init__(
         Args:
             mock_gpus: If True, skip pynvml and use mock mode for testing
             node_name: Kubernetes node name for ConfigMap-based mock GPU discovery.
-                Required when mock_gpus=True.
+                Required when mock_gpus=True and using ConfigMap-based mock.
             namespace: Kubernetes namespace for ConfigMap-based mock GPU discovery.
-                Required when mock_gpus=True.
+                Required when mock_gpus=True and using ConfigMap-based mock.
             mock_gpu_count: Number of mock GPUs to create when in mock mode and
                 ConfigMap-based mock is not available (default: 8).
         """
@@ -136,7 +136,8 @@ def _populate_mapping(self):
         """
         Creates mapping and reverse_mapping for the GPU Translator.
         Priority order:
-        1. ConfigMap 'gpu-map' based mock if mock mode enabled and node_name available
+        1. ConfigMap 'gpu-map' based mock if mock mode enabled and
+            both node_name and namespace are available
         2. Naive mock with GPU-0, GPU-1, etc. if mock mode is enabled
         3. Real GPUs via pynvml
         """