11# Spark Dashboard
22
3- Real-time hardware and LLM inference monitoring for the NVIDIA DGX Spark. A
4- Rust backend collects GPU, CPU, memory, disk, and network metrics alongside
3+ Real-time hardware and LLM inference monitoring for Linux systems with NVIDIA
4+ GPUs. Developed and tested on the NVIDIA DGX Spark, but works on any Linux
5+ host with NVIDIA drivers — discrete-GPU workstations, DGX boxes, cloud VMs.
6+ A Rust backend collects GPU, CPU, memory, disk, and network metrics alongside
57vLLM engine statistics and streams them over WebSocket to a React frontend.
68
79![ Stack] ( https://img.shields.io/badge/Rust-Axum-orange ) ![ Stack] ( https://img.shields.io/badge/React_19-TypeScript-blue ) ![ Stack] ( https://img.shields.io/badge/Tailwind_CSS_4-06B6D4 ) ![ Stack] ( https://img.shields.io/badge/Vite_8-646CFF ) ![ License] ( https://img.shields.io/badge/license-MIT-green )
@@ -10,25 +12,25 @@ vLLM engine statistics and streams them over WebSocket to a React frontend.
1012
1113## Quick Start
1214
13- ### Install on the Spark
15+ ### Install on your Linux host
1416
15- Run as your normal user on the DGX Spark (requires Rust 1.75+):
17+ Run as your normal user on any Linux host with NVIDIA drivers (requires Rust 1.75+):
1618
1719``` bash
1820cargo install spark-dashboard
1921sudo spark-dashboard service install
2022systemctl status spark-dashboard
2123```
2224
23- The dashboard is now served on port 3000. See [ Install on the DGX Spark ] ( #install-on-the-dgx-spark )
25+ The dashboard is now served on port 3000. See [ Install on your Linux host ] ( #install-on-your-linux-host-1 )
2426for the full guide, config overrides, and uninstall.
2527
2628### Develop locally
2729
2830``` bash
2931git clone https://github.com/niklasfrick/spark-dashboard.git
3032cd spark-dashboard
31- cp .env.example .env # edit with your Spark 's user/host
33+ cp .env.example .env # edit with your remote host 's user/host
3234./dev/dev.sh
3335```
3436
@@ -41,7 +43,9 @@ for details on what each script does.
4143- GPU utilization, temperature, power draw, clock frequencies, fan speed
4244- GPU event detection — thermal throttling, hardware slowdown, power brake
4345- CPU aggregate and per-core utilization with heatmap
44- - Unified memory breakdown (GPU / CPU / cached / free)
46+ - Memory breakdown — CPU RAM and GPU VRAM separately on discrete-GPU hosts,
47+ or a single unified pool on systems where CPU and GPU share memory
48+ (e.g. DGX Spark GB10, GH200)
4549- Disk and network I/O throughput
4650
4751** LLM Engine Monitoring** (vLLM via Prometheus metrics)
@@ -71,7 +75,7 @@ for details on what each script does.
7175│ │ Static files (rust-embed) │ Recharts, Tailwind│
7276│ Axum router │ ◀──── production only ───────── │ shadcn/ui │
7377└──────────────────────┘ └────────────────────┘
74- DGX Spark Browser
78+ Linux host (e.g. DGX Spark) Browser
7579```
7680
7781Two independent Tokio tasks run in parallel — one for hardware metrics (NVML,
@@ -92,23 +96,25 @@ cp .env.example .env
9296
9397| Variable | Purpose |
9498| --------------------| --------------------------------------------------------------|
95- | ` SPARK_USER ` | SSH user on the Spark (required) |
96- | ` SPARK_HOST ` | Hostname or IP of the Spark (required) |
97- | ` SPARK_DIR ` | Project path on the Spark , relative to remote home (default ` spark-dashboard ` ) |
99+ | ` DEPLOY_USER ` | SSH user on the remote host (required) |
100+ | ` DEPLOY_HOST ` | Hostname or IP of the remote host (required) |
101+ | ` DEPLOY_DIR ` | Project path on the remote host , relative to remote home (default ` spark-dashboard ` ) |
98102| ` VITE_BACKEND_URL ` | Where Vite proxies ` /ws ` and ` /api ` (default ` http://localhost:3000 ` ) |
99103
100- The scripts in ` dev/ ` source this file; Vite picks up ` VITE_* ` variables
104+ Legacy ` SPARK_USER ` / ` SPARK_HOST ` / ` SPARK_DIR ` are still accepted as a
105+ fallback when ` DEPLOY_* ` are unset — ` dev.sh ` prints a one-line deprecation
106+ note. The scripts in ` dev/ ` source this file; Vite picks up ` VITE_* ` variables
101107automatically. ` .env ` is gitignored — never commit it.
102108
103- ## Install on the DGX Spark
109+ ## Install on your Linux host
104110
105- The dashboard runs as a supervised ` systemd ` service on the Spark . Two install
106- paths; both build from source on the Spark .
111+ The dashboard runs as a supervised ` systemd ` service. Two install paths; both
112+ build from source on the host .
107113
108114### Option A — via cargo (recommended)
109115
110116``` bash
111- # On the Spark . Requires Rust 1.75+ and internet access.
117+ # On the host . Requires Rust 1.75+, NVIDIA drivers, and internet access.
112118cargo install spark-dashboard
113119sudo spark-dashboard service install
114120systemctl status spark-dashboard
@@ -126,7 +132,7 @@ Use this when you want to install without crates.io (audit the source,
126132air-gapped install, or deploy an unreleased commit).
127133
128134``` bash
129- # On the Spark . Run as your normal user — the script escalates to sudo
135+ # On the host . Run as your normal user — the script escalates to sudo
130136# only for the systemd wiring step.
131137git clone https://github.com/niklasfrick/spark-dashboard.git
132138cd spark-dashboard
@@ -148,7 +154,8 @@ sudo spark-dashboard service status # same as `systemctl status`
148154
149155Optional overrides live in ` /etc/spark-dashboard/config.env ` — set
150156` SPARK_DASHBOARD_PORT ` , ` SPARK_DASHBOARD_BIND ` , ` SPARK_DASHBOARD_POLL_INTERVAL ` ,
151- or ` RUST_LOG ` , then ` sudo systemctl restart spark-dashboard ` .
157+ ` SPARK_DASHBOARD_GPU_INDEX ` , or ` RUST_LOG ` , then
158+ ` sudo systemctl restart spark-dashboard ` .
152159
153160### Upgrade
154161
@@ -181,19 +188,21 @@ spark-dashboard service status
181188 -p, --port <PORT> Listen port [default: 3000] [env: SPARK_DASHBOARD_PORT]
182189 -b, --bind <BIND> Bind address [default: 0.0.0.0] [env: SPARK_DASHBOARD_BIND]
183190 --poll-interval <MS> Polling interval ms [default: 1000] [env: SPARK_DASHBOARD_POLL_INTERVAL]
191+ --gpu-index <IDX> NVML GPU index to monitor [default: 0] [env: SPARK_DASHBOARD_GPU_INDEX]
184192 --engine <TYPE> Manual engine type (e.g. vllm)
185193 --engine-url <URL> Manual engine endpoint (requires --engine)
186194```
187195
188- Engines are auto-detected via process scan and Docker API. Use ` --engine ` and
189- ` --engine-url ` to override when auto-detection doesn't work.
196+ On multi-GPU hosts use ` --gpu-index ` to select which device the dashboard
197+ monitors. Engines are auto-detected via process scan and Docker API. Use
198+ ` --engine ` and ` --engine-url ` to override when auto-detection doesn't work.
190199
191200## Development
192201
193202### Prerequisites
194203
195204- ** Local machine** (macOS or Linux): Node.js 20+, npm, rsync, ssh
196- - ** DGX Spark ** : Rust 1.75+, SSH access with key-based auth (no password prompts)
205+ - ** Remote host ** : Linux + NVIDIA drivers, Rust 1.75+, SSH access with key-based auth (no password prompts)
197206- Optional: ` brew install fswatch ` for instant file-change detection (the
198207 watcher falls back to 2s polling without it)
199208
@@ -205,41 +214,41 @@ Engines are auto-detected via process scan and Docker API. Use `--engine` and
205214
206215The script handles everything:
207216
208- 1 . ** Syncs** the full project to the Spark via rsync
209- 2 . ** Builds** the Rust backend on the Spark (` cargo build --release ` )
210- 3 . ** Starts** the backend on the Spark (port 3000)
217+ 1 . ** Syncs** the full project to the remote host via rsync
218+ 2 . ** Builds** the Rust backend on the remote host (` cargo build --release ` )
219+ 3 . ** Starts** the backend on the remote host (port 3000)
2112204 . ** Starts** the Vite dev server locally (port 5173)
212- 5 . ** Watches** ` src/ ` and ` Cargo.toml ` for Rust changes — auto-syncs and rebuilds on the Spark
221+ 5 . ** Watches** ` src/ ` and ` Cargo.toml ` for Rust changes — auto-syncs and rebuilds on the remote host
213222
214223| What you edit | What happens |
215224| ------------------------------------| -------------------------------------------------------------------|
216225| Frontend files (` frontend/src/ ` ) | Vite hot-reloads instantly in the browser |
217- | Backend files (` src/ ` , ` Cargo.toml ` ) | Auto-detected → rsync to Spark → rebuild → restart (~ compile time) |
226+ | Backend files (` src/ ` , ` Cargo.toml ` ) | Auto-detected → rsync to remote host → rebuild → restart (~ compile time) |
218227
219228Useful while ` dev.sh ` is running:
220229
221230``` bash
222231# Watch backend logs in another terminal
223- ssh " ${SPARK_USER } @${SPARK_HOST } " tail -f /tmp/spark-dashboard.log
232+ ssh " ${DEPLOY_USER } @${DEPLOY_HOST } " tail -f /tmp/spark-dashboard.log
224233
225234# Press Ctrl+C in the dev.sh terminal to stop everything (cleans up the remote process too)
226235```
227236
228237### How the proxy works
229238
230239By default, Vite proxies ` /ws ` and ` /api ` to ` localhost:3000 ` — this works out
231- of the box with NVIDIA Sync port forwarding (or any SSH tunnel that maps
232- the Spark's port 3000 to your local machine) .
240+ of the box with any SSH tunnel that maps the remote host's port 3000 to your
241+ local machine.
233242
234243```
235- Browser → localhost:5173/ws → Vite proxy → localhost:3000/ws (forwarded to Spark )
236- Browser → localhost:5173/api → Vite proxy → localhost:3000/api (forwarded to Spark )
244+ Browser → localhost:5173/ws → Vite proxy → localhost:3000/ws (forwarded to remote )
245+ Browser → localhost:5173/api → Vite proxy → localhost:3000/api (forwarded to remote )
237246```
238247
239248To connect directly over the network instead, set in ` .env ` :
240249
241250``` bash
242- VITE_BACKEND_URL=http://${SPARK_HOST } :3000
251+ VITE_BACKEND_URL=http://${DEPLOY_HOST } :3000
243252```
244253
245254The frontend connects to the WebSocket using ` window.location.host ` , so the
@@ -258,7 +267,7 @@ the latest stable version; see [CHANGELOG.md](./CHANGELOG.md) for release notes.
258267# Frontend
259268cd frontend && npm test
260269
261- # Backend (on Linux / DGX Spark )
270+ # Backend (on Linux)
262271cargo test
263272```
264273
@@ -276,7 +285,7 @@ real NVML/procfs parsing on Linux, with compile-time stubs on other platforms.
276285│ │ ├── mod.rs MetricsSnapshot, collector loop
277286│ │ ├── gpu.rs NVML GPU metrics + event detection
278287│ │ ├── cpu.rs CPU aggregate + per-core
279- │ │ ├── memory.rs Unified memory via /proc/meminfo
288+ │ │ ├── memory.rs System RAM + GPU VRAM + unified-memory detection
280289│ │ ├── disk.rs Disk I/O rates
281290│ │ └── network.rs Network I/O rates
282291│ └── engines/
@@ -296,7 +305,6 @@ real NVML/procfs parsing on Linux, with compile-time stubs on other platforms.
296305│ └── lib/ Circular buffer, formatting, theme
297306├── dev/
298307│ ├── dev.sh Dev loop (local frontend + remote backend)
299- │ ├── deploy.sh Production deploy to Spark
300308│ └── README.md Operator docs
301309├── .env.example Configuration template
302310├── LICENSE MIT
0 commit comments