ai-dynamo · ajcasagrande · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/docs/cli_options.md b/docs/cli_options.md
@@ -735,7 +735,7 @@ Duration in seconds to ramp warmup request rate from a proportional minimum to t
 
 #### `--gpu-telemetry` `<list>`
 
-Enable GPU telemetry console display and optionally specify: (1) 'dashboard' for realtime dashboard mode, (2) custom DCGM exporter URLs (e.g., http://node1:9401/metrics), (3) custom metrics CSV file (e.g., custom_gpu_metrics.csv). Default endpoints localhost:9400 and localhost:9401 are always attempted. Example: --gpu-telemetry dashboard node1:9400 custom.csv.
+Enable GPU telemetry console display and optionally specify: (1) 'pynvml' to use local pynvml library instead of DCGM HTTP endpoints, (2) 'dashboard' for realtime dashboard mode, (3) custom DCGM exporter URLs (e.g., http://node1:9401/metrics), (4) custom metrics CSV file (e.g., custom_gpu_metrics.csv). Default: DCGM mode with localhost:9400 and localhost:9401 endpoints. Examples: --gpu-telemetry pynvml | --gpu-telemetry dashboard node1:9400.
 
 #### `--no-gpu-telemetry`
 

diff --git a/docs/tutorials/gpu-telemetry.md b/docs/tutorials/gpu-telemetry.md
@@ -1,5 +1,5 @@
 <!--
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 -->
 
@@ -9,14 +9,17 @@ This guide shows you how to collect GPU metrics (power, utilization, memory, tem
 
 ## Overview
 
-This guide covers two setup paths depending on your inference backend:
+This guide covers three setup paths depending on your inference backend and requirements:
 
 ### Path 1: Dynamo (Built-in DCGM)
 If you're using **Dynamo**, it comes with DCGM pre-configured on port 9401. No additional setup needed! Just use the `--gpu-telemetry` flag to enable console display and optionally add additional DCGM url endpoints. URLs can be specified with or without the `http://` prefix (e.g., `localhost:9400` or `http://localhost:9400`).
 
 ### Path 2: Other Inference Servers (Custom DCGM)
 If you're using **any other inference backend**, you'll need to set up DCGM separately.
 
+### Path 3: Local GPU Monitoring (pynvml)
+If you want **simple local GPU monitoring without DCGM**, use `--gpu-telemetry pynvml`. This uses NVIDIA's nvidia-ml-py Python library (commonly known as pynvml) to collect metrics directly from the GPU driver. No HTTP endpoints or additional containers required.
+
 ## Prerequisites
 
 - NVIDIA GPU with CUDA support
@@ -36,14 +39,23 @@ AIPerf provides GPU telemetry collection with the `--gpu-telemetry` flag. Here's
 | **Custom URLs** | `aiperf profile --model MODEL ... --gpu-telemetry node1:9400 http://node2:9400/metrics` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + [custom URLs](#multi-node-gpu-telemetry-example) | ✅ Yes | ❌ No | ✅ Yes |
 | **Dashboard + URLs** | `aiperf profile --model MODEL ... --gpu-telemetry dashboard localhost:9400` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + [custom URLs](#multi-node-gpu-telemetry-example) | ✅ Yes | ✅ Yes ([see dashboard](#real-time-dashboard-view)) | ✅ Yes |
 | **Custom metrics** | `aiperf profile --model MODEL ... --gpu-telemetry custom_gpu_metrics.csv` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + [custom metrics from CSV](#customizing-displayed-metrics) | ✅ Yes | ❌ No | ✅ Yes |
+| **pynvml mode** | `aiperf profile --model MODEL ... --gpu-telemetry pynvml` | Local GPUs via pynvml library ([see pynvml section](#3-using-pynvml-local-gpu-monitoring)) | ✅ Yes | ❌ No | ✅ Yes |
+| **pynvml + dashboard** | `aiperf profile --model MODEL ... --gpu-telemetry pynvml dashboard` | Local GPUs via pynvml library | ✅ Yes | ✅ Yes ([see dashboard](#real-time-dashboard-view)) | ✅ Yes |
 | **Disabled** | `aiperf profile --model MODEL ... --no-gpu-telemetry` | None | ❌ No | ❌ No | ❌ No |
 
 > [!IMPORTANT]
-> The default endpoints `http://localhost:9400/metrics` and `http://localhost:9401/metrics` are ALWAYS attempted for telemetry collection, regardless of whether the `--gpu-telemetry` flag is used. The flag primarily controls whether metrics are displayed on the console and allows you to specify additional custom DCGM exporter endpoints. To completely disable GPU telemetry collection, use `--no-gpu-telemetry`.
+> **DCGM mode (default):** The default endpoints `http://localhost:9400/metrics` and `http://localhost:9401/metrics` are always attempted for telemetry collection, regardless of whether the `--gpu-telemetry` flag is used. The flag primarily controls whether metrics are displayed on the console and allows you to specify additional custom DCGM exporter endpoints.
+>
+> **pynvml mode:** When using `--gpu-telemetry pynvml`, DCGM endpoints are NOT used. Metrics are collected directly from local GPUs via the nvidia-ml-py library.
+>
+> To completely disable GPU telemetry collection, use `--no-gpu-telemetry`.
 
 > [!NOTE]
 > When specifying custom DCGM exporter URLs, the `http://` prefix is optional. URLs like `localhost:9400` will automatically be treated as `http://localhost:9400`. Both formats work identically.
 
+> [!TIP]
+> For simple local GPU monitoring without DCGM setup, use `--gpu-telemetry pynvml`. This collects metrics directly from the NVIDIA driver using the nvidia-ml-py library. See [Path 3: pynvml](#3-using-pynvml-local-gpu-monitoring) for details.
+
 ### Real-Time Dashboard View
 
 Adding `dashboard` to the `--gpu-telemetry` flag enables a live terminal UI (TUI) that displays GPU metrics in real-time during your benchmark runs:
@@ -300,8 +312,85 @@ aiperf profile \
 > [!TIP]
 > The `dashboard` keyword enables a live terminal UI for real-time GPU telemetry visualization. Press `5` to maximize the GPU Telemetry panel during the benchmark run.
 
+---
+
+# 3: Using pynvml (Local GPU Monitoring)
+
+For simple local GPU monitoring without DCGM infrastructure, AIPerf supports direct GPU metrics collection using NVIDIA's nvidia-ml-py Python library (commonly known as pynvml). This approach requires no additional containers, HTTP endpoints, or DCGM setup.
+
+## Prerequisites
+
+- NVIDIA GPU with driver installed
+- nvidia-ml-py package: `pip install nvidia-ml-py`
+
+## When to Use pynvml
+
+| Scenario | Recommended Approach |
+|----------|---------------------|
+| Local development/testing | pynvml |
+| Single-node inference server | pynvml or DCGM |
+| Multi-node distributed setup | DCGM (HTTP endpoints required) |
+| Production with existing DCGM | DCGM |
+| Quick GPU monitoring without setup | pynvml |
+
+## Run AIPerf with pynvml
+
+```bash
+aiperf profile \
+    --model Qwen/Qwen3-0.6B \
+    --endpoint-type chat \
+    --endpoint /v1/chat/completions \
+    --streaming \
+    --url localhost:8000 \
+    --synthetic-input-tokens-mean 100 \
+    --synthetic-input-tokens-stddev 0 \
+    --output-tokens-mean 200 \
+    --output-tokens-stddev 0 \
+    --extra-inputs min_tokens:200 \
+    --extra-inputs ignore_eos:true \
+    --concurrency 4 \
+    --request-count 64 \
+    --warmup-request-count 1 \
+    --num-dataset-entries 8 \
+    --random-seed 100 \
+    --gpu-telemetry pynvml
+```
+
 > [!TIP]
-> The `dashboard` keyword enables a live terminal UI for real-time GPU telemetry visualization. Press `5` to maximize the GPU Telemetry panel during the benchmark run.
+> Add `dashboard` after `pynvml` for the real-time terminal UI: `--gpu-telemetry pynvml dashboard`
+
+## Metrics Collected via pynvml
+
+The nvidia-ml-py library (pynvml) collects the following metrics directly from the NVIDIA driver:
+
+| Metric | Description | Unit |
+|--------|-------------|------|
+| GPU Power Usage | Current power draw | W |
+| Energy Consumption | Total energy since boot | MJ |
+| GPU Utilization | GPU compute utilization | % |
+| Memory Utilization | Memory controller utilization | % |
+| GPU Memory Used | Framebuffer memory in use | GB |
+| GPU Temperature | GPU die temperature | °C |
+| SM Utilization | Streaming multiprocessor utilization | % |
+| Decoder Utilization | Video decoder utilization | % |
+| Encoder Utilization | Video encoder utilization | % |
+| JPEG Utilization | JPEG decoder utilization | % |
+| Power Violation | Throttling duration due to power limits | µs |
+
+> [!NOTE]
+> Not all metrics are available on all GPU models. AIPerf gracefully handles missing metrics and reports only what the hardware supports.
+
+## Comparing DCGM vs pynvml
+
+| Feature | DCGM | pynvml |
+|---------|------|--------|
+| Setup complexity | Requires container/service | Just install nvidia-ml-py Python package |
+| Multi-node support | Yes (via HTTP endpoints) | No (local only) |
+| Metrics granularity | High (profiling-level metrics) | Standard (driver-level metrics) |
+| Kubernetes integration | Native with dcgm-exporter | Not applicable |
+| XID error reporting | Yes | No |
+
+---
 
 ## Multi-Node GPU Telemetry Example
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -36,6 +36,7 @@ dependencies = [
   "matplotlib>=3.10.0",
   "msgspec>=0.19.0,<1.0.0",
   "numpy~=1.26.4",
+  "nvidia-ml-py",  # Note: No version specified to be most compatible with CUDA version
   "orjson~=3.10.18",
   "pandas~=2.3.3",
   "pillow~=11.1.0",

diff --git a/src/aiperf/common/config/user_config.py b/src/aiperf/common/config/user_config.py
@@ -23,7 +23,12 @@
 from aiperf.common.config.loadgen_config import LoadGeneratorConfig
 from aiperf.common.config.output_config import OutputConfig
 from aiperf.common.config.tokenizer_config import TokenizerConfig
-from aiperf.common.enums import CustomDatasetType, GPUTelemetryMode, ServerMetricsFormat
+from aiperf.common.enums import (
+    CustomDatasetType,
+    GPUTelemetryCollectorType,
+    GPUTelemetryMode,
+    ServerMetricsFormat,
+)
 from aiperf.common.enums.plugin_enums import EndpointType
 from aiperf.common.enums.timing_enums import ArrivalPattern, TimingMode
 from aiperf.common.utils import load_json_str
@@ -414,11 +419,12 @@ def _count_dataset_entries(self) -> int:
         Field(
             description=(
                 "Enable GPU telemetry console display and optionally specify: "
-                "(1) 'dashboard' for realtime dashboard mode, "
-                "(2) custom DCGM exporter URLs (e.g., http://node1:9401/metrics), "
-                "(3) custom metrics CSV file (e.g., custom_gpu_metrics.csv). "
-                "Default endpoints localhost:9400 and localhost:9401 are always attempted. "
-                "Example: --gpu-telemetry dashboard node1:9400 custom.csv"
+                "(1) 'pynvml' to use local pynvml library instead of DCGM HTTP endpoints, "
+                "(2) 'dashboard' for realtime dashboard mode, "
+                "(3) custom DCGM exporter URLs (e.g., http://node1:9401/metrics), "
+                "(4) custom metrics CSV file (e.g., custom_gpu_metrics.csv). "
+                "Default: DCGM mode with localhost:9400 and localhost:9401 endpoints. "
+                "Examples: --gpu-telemetry pynvml | --gpu-telemetry dashboard node1:9400"
             ),
         ),
         BeforeValidator(parse_str_or_list),
@@ -441,12 +447,15 @@ def _count_dataset_entries(self) -> int:
     ] = False
 
     _gpu_telemetry_mode: GPUTelemetryMode = GPUTelemetryMode.SUMMARY
+    _gpu_telemetry_collector_type: GPUTelemetryCollectorType = (
+        GPUTelemetryCollectorType.DCGM
+    )
     _gpu_telemetry_urls: list[str] = []
     _gpu_telemetry_metrics_file: Path | None = None
 
     @model_validator(mode="after")
     def _parse_gpu_telemetry_config(self) -> Self:
-        """Parse gpu_telemetry list into mode, URLs, and metrics file."""
+        """Parse gpu_telemetry list into mode, collector type, URLs, and metrics file."""
         if (
             "no_gpu_telemetry" in self.model_fields_set
             and "gpu_telemetry" in self.model_fields_set
@@ -460,6 +469,7 @@ def _parse_gpu_telemetry_config(self) -> Self:
             return self
 
         mode = GPUTelemetryMode.SUMMARY
+        collector_type = GPUTelemetryCollectorType.DCGM
         urls = []
         metrics_file = None
 
@@ -469,17 +479,35 @@ def _parse_gpu_telemetry_config(self) -> Self:
                 metrics_file = Path(item)
                 if not metrics_file.exists():
                     raise ValueError(f"GPU metrics file not found: {item}")
-                continue
-
+            # Check for pynvml collector type
+            elif item.lower() == "pynvml":
+                collector_type = GPUTelemetryCollectorType.PYNVML
+                try:
+                    import pynvml  # noqa: F401
+                except ImportError as e:
+                    raise ValueError(
+                        "pynvml package not installed. Install with: pip install nvidia-ml-py"
+                    ) from e
             # Check for dashboard mode
-            if item in ["dashboard"]:
+            elif item in ["dashboard"]:
                 mode = GPUTelemetryMode.REALTIME_DASHBOARD
-            # Check for URLs
+            # Check for URLs (only applicable for DCGM collector)
             elif item.startswith("http") or ":" in item:
                 normalized_url = item if item.startswith("http") else f"http://{item}"
                 urls.append(normalized_url)
+            else:
+                raise ValueError(
+                    f"Invalid GPU telemetry item: {item}. Valid options are: 'pynvml', 'dashboard', '.csv' file, and URLs."
+                )
+
+        if collector_type == GPUTelemetryCollectorType.PYNVML and urls:
+            raise ValueError(
+                "Cannot use pynvml with DCGM URLs. Use either 'pynvml' for local "
+                "GPU monitoring or URLs for DCGM endpoints, not both."
+            )
 
         self._gpu_telemetry_mode = mode
+        self._gpu_telemetry_collector_type = collector_type
         self._gpu_telemetry_urls = urls
         self._gpu_telemetry_metrics_file = metrics_file
         return self
@@ -494,6 +522,11 @@ def gpu_telemetry_mode(self, value: GPUTelemetryMode) -> None:
         """Set the GPU telemetry display mode."""
         self._gpu_telemetry_mode = value
 
+    @property
+    def gpu_telemetry_collector_type(self) -> GPUTelemetryCollectorType:
+        """Get the GPU telemetry collector type (DCGM or PYNVML)."""
+        return self._gpu_telemetry_collector_type
+
     @property
     def gpu_telemetry_urls(self) -> list[str]:
         """Get the parsed GPU telemetry DCGM endpoint URLs."""

diff --git a/src/aiperf/common/enums/__init__.py b/src/aiperf/common/enums/__init__.py
@@ -109,6 +109,7 @@
     SystemState,
 )
 from aiperf.common.enums.telemetry_enums import (
+    GPUTelemetryCollectorType,
     GPUTelemetryMode,
 )
 from aiperf.common.enums.timing_enums import (
@@ -150,6 +151,7 @@
     "ExportLevel",
     "FrequencyMetricUnit",
     "FrequencyMetricUnitInfo",
+    "GPUTelemetryCollectorType",
     "GPUTelemetryMode",
     "GenericMetricUnit",
     "ImageFormat",

diff --git a/src/aiperf/common/enums/telemetry_enums.py b/src/aiperf/common/enums/telemetry_enums.py
@@ -1,9 +1,19 @@
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
 from aiperf.common.enums.base_enums import CaseInsensitiveStrEnum
 
 
+class GPUTelemetryCollectorType(CaseInsensitiveStrEnum):
+    """GPU telemetry collector implementation type."""
+
+    DCGM = "dcgm"
+    """Collects GPU telemetry metrics from DCGM Prometheus exporter."""
+
+    PYNVML = "pynvml"
+    """Collects GPU telemetry metrics using the pynvml Python library."""
+
+
 class GPUTelemetryMode(CaseInsensitiveStrEnum):
     """GPU telemetry display mode."""
 

diff --git a/src/aiperf/common/mixins/base_metrics_collector_mixin.py b/src/aiperf/common/mixins/base_metrics_collector_mixin.py
@@ -1,4 +1,4 @@
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
 """Base mixin for async HTTP metrics data collectors.
@@ -160,7 +160,7 @@ class BaseMetricsCollectorMixin(AIPerfLifecycleMixin, ABC, Generic[TRecord]):
         - Precise HTTP timing capture for correlation analysis
 
     Used by:
-        - GPUTelemetryDataCollector (DCGM metrics from GPU monitoring)
+        - DCGMTelemetryCollector (DCGM metrics from GPU monitoring)
         - ServerMetricsDataCollector (Prometheus metrics from inference servers)
 
     Example:

diff --git a/src/aiperf/common/models/telemetry_models.py b/src/aiperf/common/models/telemetry_models.py
@@ -1,4 +1,4 @@
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 
 import numpy as np
@@ -28,14 +28,34 @@ class TelemetryMetrics(AIPerfBaseModel):
         default=None, description="Cumulative energy consumption in MJ"
     )
     gpu_utilization: float | None = Field(
-        default=None, description="GPU utilization percentage (0-100)"
+        default=None,
+        description="GPU utilization percentage (0-100). "
+        "Percent of time over the past sample period during which one or more kernels was executing on the GPU.",
     )
     gpu_memory_used: float | None = Field(
         default=None, description="GPU memory used in GB"
     )
     gpu_temperature: float | None = Field(
         default=None, description="GPU temperature in °C"
     )
+    mem_utilization: float | None = Field(
+        default=None,
+        description="Memory bandwidth utilization percentage (0-100). "
+        "Percent of time over the past sample period during which global (device) memory was being read or written.",
+    )
+    sm_utilization: float | None = Field(
+        default=None,
+        description="Streaming multiprocessor utilization percentage (0-100)",
+    )
+    decoder_utilization: float | None = Field(
+        default=None, description="Video decoder (NVDEC) utilization percentage (0-100)"
+    )
+    encoder_utilization: float | None = Field(
+        default=None, description="Video encoder (NVENC) utilization percentage (0-100)"
+    )
+    jpg_utilization: float | None = Field(
+        default=None, description="JPEG decoder utilization percentage (0-100)"
+    )
     xid_errors: float | None = Field(
         default=None, description="Value of the last XID error encountered"
     )
@@ -92,7 +112,7 @@ class TelemetryRecord(GpuMetadata):
         description="Nanosecond wall-clock timestamp when telemetry was collected (time_ns)"
     )
     dcgm_url: str = Field(
-        description="Source DCGM endpoint URL (e.g., 'http://node1:9401/metrics')"
+        description="Source identifier (DCGM URL e.g., 'http://node1:9401/metrics' or 'pynvml://localhost')"
     )
     telemetry_data: TelemetryMetrics = Field(
         description="GPU metrics snapshot collected at this timestamp"

diff --git a/src/aiperf/controller/system_controller.py b/src/aiperf/controller/system_controller.py
@@ -191,7 +191,6 @@ async def _start_services(self) -> None:
 
         # Start optional services before waiting for registration so they can participate in configuration
         if not self.user_config.gpu_telemetry_disabled:
-            self.debug("Starting optional TelemetryManager service")
             await self.service_manager.run_service(ServiceType.GPU_TELEMETRY_MANAGER)
         else:
             self.info("GPU telemetry disabled via --no-gpu-telemetry")