Skip to content

Commit 200a664

Browse files
authored
feat: AIPerf Interactive Dashboard (#502)
Signed-off-by: Ilana Nguyen <inguyen@nvidia.com>
1 parent b5f1c84 commit 200a664

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+17705
-626
lines changed

.pre-commit-config.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
3-
exclude: (?x)^(
4-
tests/unit/plot/fixtures/.*\.(json|jsonl)$|
5-
)
63
repos:
74
- repo: local
85
hooks:

docs/cli_options.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ SPDX-License-Identifier: Apache-2.0
88
## `aiperf` Commands
99

1010
- [`profile`](#aiperf-profile) - Run the Profile subcommand.
11-
- [`plot`](#aiperf-plot) - Generate PNG visualizations from AIPerf profiling data.
11+
- [`plot`](#aiperf-plot) - Generate visualizations from AIPerf profiling data.
1212

1313
## `aiperf profile`
1414

@@ -484,11 +484,9 @@ Type of UI to use.
484484
<br>_Choices: [`none`, `simple`, `dashboard`]_
485485
<br>_Default: `dashboard`_
486486

487-
<hr>
488-
489487
## `aiperf plot`
490488

491-
## Parameters
489+
## Parameters Options
492490

493491
#### `--paths`, `--empty-paths` `<list>`
494492

@@ -502,3 +500,20 @@ Directory to save generated plots. Defaults to <first_path>/plots if not specifi
502500

503501
Plot theme to use: 'light' (white background) or 'dark' (dark background). Defaults to 'light'.
504502
<br>_Default: `light`_
503+
504+
#### `--config` `<str>`
505+
506+
Path to custom plot configuration YAML file. If not specified, auto-creates and uses ~/.aiperf/plot_config.yaml.
507+
508+
#### `--verbose`, `--no-verbose`
509+
510+
Show detailed error tracebacks in console (errors are always logged to ~/.aiperf/plot.log).
511+
512+
#### `--dashboard`, `--no-dashboard`
513+
514+
Launch interactive dashboard server instead of generating static PNGs.
515+
516+
#### `--port` `<int>`
517+
518+
Port for dashboard server (only used with --dashboard). Defaults to 8050.
519+
<br>_Default: `8050`_
-1.47 KB
Loading
13.9 KB
Loading
-39.7 KB
Loading
15 KB
Loading
-4.02 KB
Loading
138 KB
Loading

docs/tutorials/plot.md

Lines changed: 54 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ aiperf plot <run1> <run2> <run3>
3535
# Specify custom output location
3636
aiperf plot <path> --output <output_directory>
3737

38+
# Launch interactive dashboard for exploration
39+
aiperf plot <path> --dashboard
40+
3841
# Use dark theme
3942
aiperf plot <path> --theme dark
4043
```
@@ -78,15 +81,15 @@ artifacts/sweep_qwen/
7881

7982
![TTFT vs Throughput](../diagrams/plot_examples/multi_run/ttft_vs_throughput.png)
8083

81-
Shows how time to first token varies with request throughput across concurrency levels, helping identify configurations that balance responsiveness with load.
84+
Shows how time to first token varies with request throughput across concurrency levels. **Potentially useful for finding the sweet spot between responsiveness and capacity**: ideal configurations maintain low TTFT even at high throughput. If TTFT increases sharply at certain throughput levels, this may indicate a prefill bottleneck (batch scheduler contention or compute limitations).
8285

8386
![Pareto Curve: Throughput per GPU vs Latency](../diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_latency.png)
8487

85-
Highlights optimal configurations on the Pareto frontier that maximize GPU efficiency while minimizing latency.
88+
Highlights optimal configurations on the Pareto frontier that maximize GPU efficiency while minimizing latency. **Points on the frontier are optimal; points below are suboptimal** configurations. Potentially useful for choosing GPU count and batch sizes to maximize hardware ROI. A steep curve may indicate opportunities to improve latency with minimal throughput loss, while a flat curve can suggest you're near the efficiency limit.
8689

8790
![Pareto Curve: Throughput per GPU vs Interactivity](../diagrams/plot_examples/multi_run/pareto_curve_throughput_per_gpu_vs_interactivity.png)
8891

89-
Shows the trade-off between GPU efficiency and interactivity (TTFT).
92+
Shows the trade-off between GPU efficiency and interactivity (TTFT). **Potentially useful for determining max concurrency before user experience degrades**: flat regions show where adding concurrency maintains interactivity, while steep sections may indicate diminishing returns. The "knee" of the curve can help identify where throughput gains start to significantly hurt responsiveness.
9093

9194
### Single-Run Analysis Mode
9295

@@ -115,15 +118,19 @@ artifacts/single_run/
115118

116119
![TTFT Over Time](../diagrams/plot_examples/single_run/time_series/ttft_over_time.png)
117120

118-
Time to first token for each request, revealing prefill latency patterns and potential warm-up effects.
121+
Time to first token for each request, revealing prefill latency patterns and potential warm-up effects. **Initial spikes may indicate cold start; stable later values show steady-state performance**. Potentially useful for determining necessary warmup period or identifying warmup configuration issues. Unexpected spikes during steady-state can suggest resource contention, garbage collection pauses, or batch scheduler interference.
119122

120123
![Inter-Token Latency Over Time](../diagrams/plot_examples/single_run/time_series/itl_over_time.png)
121124

122-
Inter-token latency per request, showing generation performance consistency.
125+
Inter-token latency per request, showing generation performance consistency. **Consistent ITL may indicate stable generation; variance can suggest batch scheduling issues**. Potentially useful for identifying decode-phase bottlenecks separate from prefill issues. If ITL increases over time, this may indicate KV cache memory pressure or growing batch sizes causing decode slowdown.
123126

124127
![Request Latency Over Time](../diagrams/plot_examples/single_run/time_series/latency_over_time.png)
125128

126-
End-to-end latency progression throughout the run.
129+
End-to-end latency progression throughout the run. **Overall system health check**: ramp-up at the start is normal, but sustained increases may indicate performance degradation. Potentially useful for identifying if your system maintains performance or degrades over time. Sudden jumps may correlate with other requests completing or starting, potentially revealing batch scheduling patterns.
130+
131+
![Request Timeline: TTFT](../diagrams/plot_examples/single_run/time_series/ttft_timeline.png)
132+
133+
Individual requests plotted as lines spanning their duration from start to end. **Visualizes request scheduling and concurrency patterns**: overlapping lines show concurrent execution, while gaps may indicate scheduling delays. Dense packing can suggest efficient utilization; sparse patterns may suggest underutilized capacity or rate limiting effects.
127134

128135
### Dispersed Throughput
129136

@@ -133,7 +140,9 @@ The **Dispersed Throughput Over Time** plot uses an event-based approach for acc
133140

134141
This provides smooth, continuous representation that correlates better with server metrics like GPU utilization.
135142

136-
![Dispersed Throughput Over Time](../diagrams/plot_examples/single_run/dispersed_throughput_over_time.png)
143+
![Dispersed Throughput Over Time](../diagrams/plot_examples/single_run/time_series/dispersed_throughput_over_time.png)
144+
145+
**Smooth ramps may show healthy scaling; drops can indicate bottlenecks**. Potentially useful for correlating with GPU metrics to identify whether bottlenecks are GPU-bound, memory-bound, or CPU-bound. A plateau may indicate you've reached max sustainable throughput for your configuration. Sudden drops can potentially correlate with resource exhaustion or scheduler saturation.
137146

138147
## Customization Options
139148

@@ -277,12 +286,45 @@ The dark theme uses a dark background optimized for presentations while maintain
277286

278287
![ITL Across Timeslices (Dark)](../diagrams/plot_examples/single_run/time_series/theme_dark_mode/timeslices_itl.png)
279288

289+
## Interactive Dashboard Mode
290+
291+
Launch an interactive localhost-hosted dashboard for real-time exploration of profiling data with dynamic metric selection, filtering, and visualization customization.
292+
293+
```bash
294+
# Launch dashboard with default settings (localhost:8050)
295+
aiperf plot --dashboard
296+
297+
# Specify custom port
298+
aiperf plot --dashboard --port 9000
299+
300+
# Launch with dark theme
301+
aiperf plot --dashboard --theme dark
302+
303+
# Specify data paths
304+
aiperf plot path/to/runs --dashboard
305+
```
306+
307+
**Key Features:**
308+
- **Dynamic metric switching**: Toggle between avg, p50, p90, p95, p99 statistics in real-time
309+
- **Run filtering**: Select which runs to display via checkboxes
310+
- **Log scale toggles**: Per-plot X/Y axis log scale controls
311+
- **Config viewer**: Click on data points to view full run configuration
312+
- **Custom plots**: Add new plots with custom axis selections
313+
- **Plot management**: Hide/show plots dynamically
314+
- **Export**: Download visible plots as PNG bundle
315+
316+
The dashboard automatically detects visualization mode (multi-run comparison or single-run analysis) and displays appropriate tabs and controls. Press Ctrl+C in the terminal to stop the server.
317+
318+
> [!TIP]
319+
> The dashboard runs on localhost only and requires no authentication. For remote access via SSH, use port forwarding: `ssh -L 8080:localhost:8080 user@remote-host`
320+
321+
> [!NOTE]
322+
> Dashboard mode and PNG mode are separate. To generate both static PNGs and launch the dashboard, run the commands separately.
323+
280324
## Advanced Features
281325

282326
### GPU Telemetry Integration
283327

284-
When GPU telemetry is collected (via `--gpu-telemetry` flag during profiling), plots automatically include GPU metrics.
285-
286328
**Multi-run plots** (when telemetry available):
287329
- Token Throughput per GPU vs Latency
288330
- Token Throughput per GPU vs Interactivity
@@ -293,6 +335,8 @@ When GPU telemetry is collected (via `--gpu-telemetry` flag during profiling), p
293335

294336
![GPU Utilization and Throughput Over Time](../diagrams/plot_examples/single_run/time_series/gpu_utilization_and_throughput_over_time.png)
295337

338+
**Correlates compute resources with token generation performance**. High GPU utilization with low throughput may suggest compute-bound workloads (consider optimizing model/batch size). Low utilization with low throughput can indicate bottlenecks elsewhere (KV cache, memory bandwidth, CPU scheduling). Potentially useful for targeting >80% GPU utilization for efficient hardware usage.
339+
296340
> [!TIP]
297341
> See the [GPU Telemetry Tutorial](gpu-telemetry.md) for setup and detailed analysis.
298342
@@ -306,7 +350,7 @@ When timeslice data is available (via `--slice-duration` during profiling), plot
306350
- Throughput Across Timeslices
307351
- Latency Across Timeslices
308352

309-
These help identify warm-up effects, performance degradation, and steady-state behavior.
353+
**Timeslices enable easy outlier identification and bucketing analysis**. Each time window (bucket) shows avg/p50/p95 statistics, making it simple to spot which periods have outlier performance. Slice 0 often shows cold-start overhead, while later slices may reveal degradation. Flat bars across slices may indicate stable performance; increasing trends can suggest resource exhaustion. Potentially useful for quickly isolating performance issues to specific phases (warmup, steady-state, or degradation).
310354

311355
![TTFT Across Timeslices](../diagrams/plot_examples/single_run/timeslices/timeslices_ttft.png)
312356

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ dependencies = [
3737
"pandas~=2.3.3",
3838
"pillow~=11.1.0",
3939
"plotly~=6.4.0",
40+
"dash~=2.18.0",
41+
"dash-bootstrap-components~=1.6.0",
4042
"prometheus_client~=0.23.1",
4143
"psutil~=7.0.0",
4244
"pydantic>=2.10.0,<3.0.0",

0 commit comments

Comments
 (0)