Skip to content

Commit 3b77d5a

Browse files
authored
Merge pull request #7 from niklasfrick/feat/hardware-host-agnostic
feat: make dashboard hardware- and host-agnostic
2 parents a2a7e6d + 19a8505 commit 3b77d5a

27 files changed

Lines changed: 398 additions & 146 deletions

.env.example

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,30 @@
11
# Copy this file to `.env` and edit with your own values.
22
# The dev/ scripts source this file; Vite auto-loads VITE_* variables.
33

4-
# --- Connection to the DGX Spark running the backend --------------------------
5-
# SSH user on the Spark. Must have key-based auth configured (no passwords).
6-
SPARK_USER=your-user
4+
# --- Connection to the remote Linux host running the backend -----------------
5+
# SSH user on the remote host. Must have key-based auth configured (no passwords).
6+
DEPLOY_USER=your-user
77

8-
# Hostname or IP address of the Spark.
9-
SPARK_HOST=192.168.1.100
8+
# Hostname or IP address of the remote host.
9+
DEPLOY_HOST=192.168.1.100
1010

11-
# Path on the Spark where the project is synced.
11+
# Path on the remote host where the project is synced.
1212
# Relative paths (no leading slash) are resolved against the remote user's
1313
# home directory. Use an absolute path like /opt/spark-dashboard if you want
1414
# to sync somewhere else.
1515
# Do NOT use a leading `~/` — that would be expanded to your *local* home.
16-
SPARK_DIR=spark-dashboard
16+
DEPLOY_DIR=spark-dashboard
17+
18+
# Legacy aliases: SPARK_USER / SPARK_HOST / SPARK_DIR still work as a fallback
19+
# if DEPLOY_* are unset. You'll see a one-line deprecation note at dev.sh startup.
1720

1821
# --- Frontend (Vite dev server) ----------------------------------------------
1922
# Where the Vite dev server proxies `/ws` and `/api` calls.
2023
#
2124
# Default `http://localhost:3000` works when you use an SSH tunnel that maps
22-
# the Spark's port 3000 to your local machine (e.g. NVIDIA Sync port
23-
# forwarding or `ssh -L 3000:localhost:3000 ...`).
25+
# the remote host's port 3000 to your local machine
26+
# (e.g. `ssh -L 3000:localhost:3000 ...`).
2427
#
2528
# To connect directly over the network, set this to:
26-
# VITE_BACKEND_URL=http://${SPARK_HOST}:3000
29+
# VITE_BACKEND_URL=http://${DEPLOY_HOST}:3000
2730
VITE_BACKEND_URL=http://localhost:3000

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ bug reports, clear reproductions, and targeted PRs are all welcome.
88
```bash
99
git clone https://github.com/niklasfrick/spark-dashboard.git
1010
cd spark-dashboard
11-
cp .env.example .env # edit with your Spark's user/host
11+
cp .env.example .env # edit with your remote host's user/host
1212
./dev/dev.sh
1313
```
1414

@@ -21,7 +21,7 @@ environment variables are required.
2121
# Frontend (runs on any OS)
2222
cd frontend && npm test
2323

24-
# Backend (must run on Linux / the DGX Spark — depends on NVML, procfs)
24+
# Backend (must run on Linux with NVIDIA drivers — depends on NVML, procfs)
2525
cargo test
2626
```
2727

@@ -75,6 +75,6 @@ Do **not** hand-edit version numbers — `release-please` owns them.
7575
When filing a bug, please include:
7676

7777
- What you expected vs. what happened
78-
- DGX Spark OS / driver / CUDA versions (`nvidia-smi`)
78+
- Host OS / NVIDIA driver / CUDA versions (`nvidia-smi`)
7979
- Which engine adapter was involved (vLLM, etc.), if any
8080
- A snippet from `/tmp/spark-dashboard.log` around the failure

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
name = "spark-dashboard"
33
version = "0.2.0"
44
edition = "2021"
5-
description = "Real-time hardware and LLM inference monitoring for the NVIDIA DGX Spark"
5+
description = "Real-time hardware and LLM inference monitoring for Linux hosts with NVIDIA GPUs"
66
license = "MIT"
77
repository = "https://github.com/niklasfrick/spark-dashboard"
88
readme = "README.md"

README.md

Lines changed: 43 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# Spark Dashboard
22

3-
Real-time hardware and LLM inference monitoring for the NVIDIA DGX Spark. A
4-
Rust backend collects GPU, CPU, memory, disk, and network metrics alongside
3+
Real-time hardware and LLM inference monitoring for Linux systems with NVIDIA
4+
GPUs. Developed and tested on the NVIDIA DGX Spark, but works on any Linux
5+
host with NVIDIA drivers — discrete-GPU workstations, DGX boxes, cloud VMs.
6+
A Rust backend collects GPU, CPU, memory, disk, and network metrics alongside
57
vLLM engine statistics and streams them over WebSocket to a React frontend.
68

79
![Stack](https://img.shields.io/badge/Rust-Axum-orange) ![Stack](https://img.shields.io/badge/React_19-TypeScript-blue) ![Stack](https://img.shields.io/badge/Tailwind_CSS_4-06B6D4) ![Stack](https://img.shields.io/badge/Vite_8-646CFF) ![License](https://img.shields.io/badge/license-MIT-green)
@@ -10,25 +12,25 @@ vLLM engine statistics and streams them over WebSocket to a React frontend.
1012

1113
## Quick Start
1214

13-
### Install on the Spark
15+
### Install on your Linux host
1416

15-
Run as your normal user on the DGX Spark (requires Rust 1.75+):
17+
Run as your normal user on any Linux host with NVIDIA drivers (requires Rust 1.75+):
1618

1719
```bash
1820
cargo install spark-dashboard
1921
sudo spark-dashboard service install
2022
systemctl status spark-dashboard
2123
```
2224

23-
The dashboard is now served on port 3000. See [Install on the DGX Spark](#install-on-the-dgx-spark)
25+
The dashboard is now served on port 3000. See [Install on your Linux host](#install-on-your-linux-host-1)
2426
for the full guide, config overrides, and uninstall.
2527

2628
### Develop locally
2729

2830
```bash
2931
git clone https://github.com/niklasfrick/spark-dashboard.git
3032
cd spark-dashboard
31-
cp .env.example .env # edit with your Spark's user/host
33+
cp .env.example .env # edit with your remote host's user/host
3234
./dev/dev.sh
3335
```
3436

@@ -41,7 +43,9 @@ for details on what each script does.
4143
- GPU utilization, temperature, power draw, clock frequencies, fan speed
4244
- GPU event detection — thermal throttling, hardware slowdown, power brake
4345
- CPU aggregate and per-core utilization with heatmap
44-
- Unified memory breakdown (GPU / CPU / cached / free)
46+
- Memory breakdown — CPU RAM and GPU VRAM separately on discrete-GPU hosts,
47+
or a single unified pool on systems where CPU and GPU share memory
48+
(e.g. DGX Spark GB10, GH200)
4549
- Disk and network I/O throughput
4650

4751
**LLM Engine Monitoring** (vLLM via Prometheus metrics)
@@ -71,7 +75,7 @@ for details on what each script does.
7175
│ │ Static files (rust-embed) │ Recharts, Tailwind│
7276
│ Axum router │ ◀──── production only ───────── │ shadcn/ui │
7377
└──────────────────────┘ └────────────────────┘
74-
DGX Spark Browser
78+
Linux host (e.g. DGX Spark) Browser
7579
```
7680

7781
Two independent Tokio tasks run in parallel — one for hardware metrics (NVML,
@@ -92,23 +96,25 @@ cp .env.example .env
9296

9397
| Variable | Purpose |
9498
|--------------------|--------------------------------------------------------------|
95-
| `SPARK_USER` | SSH user on the Spark (required) |
96-
| `SPARK_HOST` | Hostname or IP of the Spark (required) |
97-
| `SPARK_DIR` | Project path on the Spark, relative to remote home (default `spark-dashboard`) |
99+
| `DEPLOY_USER` | SSH user on the remote host (required) |
100+
| `DEPLOY_HOST` | Hostname or IP of the remote host (required) |
101+
| `DEPLOY_DIR` | Project path on the remote host, relative to remote home (default `spark-dashboard`) |
98102
| `VITE_BACKEND_URL` | Where Vite proxies `/ws` and `/api` (default `http://localhost:3000`) |
99103

100-
The scripts in `dev/` source this file; Vite picks up `VITE_*` variables
104+
Legacy `SPARK_USER` / `SPARK_HOST` / `SPARK_DIR` are still accepted as a
105+
fallback when `DEPLOY_*` are unset — `dev.sh` prints a one-line deprecation
106+
note. The scripts in `dev/` source this file; Vite picks up `VITE_*` variables
101107
automatically. `.env` is gitignored — never commit it.
102108

103-
## Install on the DGX Spark
109+
## Install on your Linux host
104110

105-
The dashboard runs as a supervised `systemd` service on the Spark. Two install
106-
paths; both build from source on the Spark.
111+
The dashboard runs as a supervised `systemd` service. Two install paths; both
112+
build from source on the host.
107113

108114
### Option A — via cargo (recommended)
109115

110116
```bash
111-
# On the Spark. Requires Rust 1.75+ and internet access.
117+
# On the host. Requires Rust 1.75+, NVIDIA drivers, and internet access.
112118
cargo install spark-dashboard
113119
sudo spark-dashboard service install
114120
systemctl status spark-dashboard
@@ -126,7 +132,7 @@ Use this when you want to install without crates.io (audit the source,
126132
air-gapped install, or deploy an unreleased commit).
127133

128134
```bash
129-
# On the Spark. Run as your normal user — the script escalates to sudo
135+
# On the host. Run as your normal user — the script escalates to sudo
130136
# only for the systemd wiring step.
131137
git clone https://github.com/niklasfrick/spark-dashboard.git
132138
cd spark-dashboard
@@ -148,7 +154,8 @@ sudo spark-dashboard service status # same as `systemctl status`
148154

149155
Optional overrides live in `/etc/spark-dashboard/config.env` — set
150156
`SPARK_DASHBOARD_PORT`, `SPARK_DASHBOARD_BIND`, `SPARK_DASHBOARD_POLL_INTERVAL`,
151-
or `RUST_LOG`, then `sudo systemctl restart spark-dashboard`.
157+
`SPARK_DASHBOARD_GPU_INDEX`, or `RUST_LOG`, then
158+
`sudo systemctl restart spark-dashboard`.
152159

153160
### Upgrade
154161

@@ -181,19 +188,21 @@ spark-dashboard service status
181188
-p, --port <PORT> Listen port [default: 3000] [env: SPARK_DASHBOARD_PORT]
182189
-b, --bind <BIND> Bind address [default: 0.0.0.0] [env: SPARK_DASHBOARD_BIND]
183190
--poll-interval <MS> Polling interval ms [default: 1000] [env: SPARK_DASHBOARD_POLL_INTERVAL]
191+
--gpu-index <IDX> NVML GPU index to monitor [default: 0] [env: SPARK_DASHBOARD_GPU_INDEX]
184192
--engine <TYPE> Manual engine type (e.g. vllm)
185193
--engine-url <URL> Manual engine endpoint (requires --engine)
186194
```
187195

188-
Engines are auto-detected via process scan and Docker API. Use `--engine` and
189-
`--engine-url` to override when auto-detection doesn't work.
196+
On multi-GPU hosts use `--gpu-index` to select which device the dashboard
197+
monitors. Engines are auto-detected via process scan and Docker API. Use
198+
`--engine` and `--engine-url` to override when auto-detection doesn't work.
190199

191200
## Development
192201

193202
### Prerequisites
194203

195204
- **Local machine** (macOS or Linux): Node.js 20+, npm, rsync, ssh
196-
- **DGX Spark**: Rust 1.75+, SSH access with key-based auth (no password prompts)
205+
- **Remote host**: Linux + NVIDIA drivers, Rust 1.75+, SSH access with key-based auth (no password prompts)
197206
- Optional: `brew install fswatch` for instant file-change detection (the
198207
watcher falls back to 2s polling without it)
199208

@@ -205,41 +214,41 @@ Engines are auto-detected via process scan and Docker API. Use `--engine` and
205214

206215
The script handles everything:
207216

208-
1. **Syncs** the full project to the Spark via rsync
209-
2. **Builds** the Rust backend on the Spark (`cargo build --release`)
210-
3. **Starts** the backend on the Spark (port 3000)
217+
1. **Syncs** the full project to the remote host via rsync
218+
2. **Builds** the Rust backend on the remote host (`cargo build --release`)
219+
3. **Starts** the backend on the remote host (port 3000)
211220
4. **Starts** the Vite dev server locally (port 5173)
212-
5. **Watches** `src/` and `Cargo.toml` for Rust changes — auto-syncs and rebuilds on the Spark
221+
5. **Watches** `src/` and `Cargo.toml` for Rust changes — auto-syncs and rebuilds on the remote host
213222

214223
| What you edit | What happens |
215224
|------------------------------------|-------------------------------------------------------------------|
216225
| Frontend files (`frontend/src/`) | Vite hot-reloads instantly in the browser |
217-
| Backend files (`src/`, `Cargo.toml`) | Auto-detected → rsync to Spark → rebuild → restart (~compile time) |
226+
| Backend files (`src/`, `Cargo.toml`) | Auto-detected → rsync to remote host → rebuild → restart (~compile time) |
218227

219228
Useful while `dev.sh` is running:
220229

221230
```bash
222231
# Watch backend logs in another terminal
223-
ssh "${SPARK_USER}@${SPARK_HOST}" tail -f /tmp/spark-dashboard.log
232+
ssh "${DEPLOY_USER}@${DEPLOY_HOST}" tail -f /tmp/spark-dashboard.log
224233

225234
# Press Ctrl+C in the dev.sh terminal to stop everything (cleans up the remote process too)
226235
```
227236

228237
### How the proxy works
229238

230239
By default, Vite proxies `/ws` and `/api` to `localhost:3000` — this works out
231-
of the box with NVIDIA Sync port forwarding (or any SSH tunnel that maps
232-
the Spark's port 3000 to your local machine).
240+
of the box with any SSH tunnel that maps the remote host's port 3000 to your
241+
local machine.
233242

234243
```
235-
Browser → localhost:5173/ws → Vite proxy → localhost:3000/ws (forwarded to Spark)
236-
Browser → localhost:5173/api → Vite proxy → localhost:3000/api (forwarded to Spark)
244+
Browser → localhost:5173/ws → Vite proxy → localhost:3000/ws (forwarded to remote)
245+
Browser → localhost:5173/api → Vite proxy → localhost:3000/api (forwarded to remote)
237246
```
238247

239248
To connect directly over the network instead, set in `.env`:
240249

241250
```bash
242-
VITE_BACKEND_URL=http://${SPARK_HOST}:3000
251+
VITE_BACKEND_URL=http://${DEPLOY_HOST}:3000
243252
```
244253

245254
The frontend connects to the WebSocket using `window.location.host`, so the
@@ -258,7 +267,7 @@ the latest stable version; see [CHANGELOG.md](./CHANGELOG.md) for release notes.
258267
# Frontend
259268
cd frontend && npm test
260269

261-
# Backend (on Linux / DGX Spark)
270+
# Backend (on Linux)
262271
cargo test
263272
```
264273

@@ -276,7 +285,7 @@ real NVML/procfs parsing on Linux, with compile-time stubs on other platforms.
276285
│ │ ├── mod.rs MetricsSnapshot, collector loop
277286
│ │ ├── gpu.rs NVML GPU metrics + event detection
278287
│ │ ├── cpu.rs CPU aggregate + per-core
279-
│ │ ├── memory.rs Unified memory via /proc/meminfo
288+
│ │ ├── memory.rs System RAM + GPU VRAM + unified-memory detection
280289
│ │ ├── disk.rs Disk I/O rates
281290
│ │ └── network.rs Network I/O rates
282291
│ └── engines/
@@ -296,7 +305,6 @@ real NVML/procfs parsing on Linux, with compile-time stubs on other platforms.
296305
│ └── lib/ Circular buffer, formatting, theme
297306
├── dev/
298307
│ ├── dev.sh Dev loop (local frontend + remote backend)
299-
│ ├── deploy.sh Production deploy to Spark
300308
│ └── README.md Operator docs
301309
├── .env.example Configuration template
302310
├── LICENSE MIT

dev/README.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,39 +4,43 @@ Development-only scripts for spark-dashboard. Configuration is read from a
44
repo-root `.env` file — copy `.env.example` to `.env` and edit before running.
55

66
For **production installs**, use `cargo install spark-dashboard` or
7-
`packaging/install.sh`. See the repo [README](../README.md#install-on-the-dgx-spark).
7+
`packaging/install.sh`. See the repo [README](../README.md#install-on-your-linux-host).
88

99
## Scripts
1010

1111
### `./dev/dev.sh` — development loop
1212

1313
Runs the full dev environment:
1414

15-
1. rsyncs the project to `${SPARK_USER}@${SPARK_HOST}:${SPARK_DIR}`
16-
2. builds and starts the Rust backend on the Spark (`cargo build --release`)
15+
1. rsyncs the project to `${DEPLOY_USER}@${DEPLOY_HOST}:${DEPLOY_DIR}`
16+
2. builds and starts the Rust backend on the remote host (`cargo build --release`)
1717
3. starts the Vite dev server locally on port 5173 with a proxy to the backend
1818
4. streams remote backend logs from `/tmp/spark-dashboard.log`
1919
5. watches `src/` and `Cargo.toml` — on change, re-syncs and rebuilds the backend
2020

2121
Frontend edits hot-reload in the browser via Vite. Backend edits trigger a
2222
remote rebuild (takes about as long as `cargo build --release` does on your
23-
Spark).
23+
remote host).
2424

2525
## Required environment variables
2626

2727
| Variable | Purpose |
2828
|--------------------|--------------------------------------------------------------|
29-
| `SPARK_USER` | SSH user on the Spark (required) |
30-
| `SPARK_HOST` | Hostname or IP of the Spark (required) |
31-
| `SPARK_DIR` | Project path on the Spark, relative to remote home (default `spark-dashboard`) |
29+
| `DEPLOY_USER` | SSH user on the remote host (required) |
30+
| `DEPLOY_HOST` | Hostname or IP of the remote host (required) |
31+
| `DEPLOY_DIR` | Project path on the remote host, relative to remote home (default `spark-dashboard`) |
3232
| `VITE_BACKEND_URL` | Where Vite proxies `/ws` and `/api` (default `http://localhost:3000`) |
3333

34-
Missing `SPARK_USER` or `SPARK_HOST` causes the script to exit immediately with
35-
a clear message.
34+
Missing `DEPLOY_USER` or `DEPLOY_HOST` causes the script to exit immediately
35+
with a clear message.
36+
37+
Legacy `SPARK_USER` / `SPARK_HOST` / `SPARK_DIR` are still accepted as a
38+
fallback when `DEPLOY_*` are unset; you'll see a one-line deprecation note on
39+
startup.
3640

3741
## Prerequisites
3842

3943
- **Local machine**: Node.js 20+, npm, rsync, ssh
40-
- **DGX Spark**: Rust 1.75+ in `~/.cargo/env`, reachable over SSH
44+
- **Remote host**: Linux with NVIDIA drivers, Rust 1.75+ in `~/.cargo/env`, reachable over SSH
4145
- **SSH key auth** configured — the scripts make many non-interactive SSH calls and will block on password prompts
4246
- Optional: `brew install fswatch` for instant change detection (otherwise the watcher polls every 2s)

0 commit comments

Comments
 (0)