feat: make dashboard hardware- and host-agnostic#7
Merged
Conversation
Project was built targeting the NVIDIA DGX Spark (Grace+Blackwell GB10) and baked a few Spark-specific assumptions into the metrics layer that produced wrong data on any other Linux + NVIDIA GPU host. Fix the load-bearing cases, add multi-GPU awareness, and generalize the framing without renaming the crate. Load-bearing fixes - metrics/memory: drop hardcoded is_unified=true. Read NVML memory_info for real GPU VRAM total/used and auto-detect unified memory via a pure detect_unified_memory helper (GPU VRAM within 10% of system RAM). Added gpu_memory_total_bytes and gpu_memory_used_bytes to MemoryMetrics. - metrics/mod: add --gpu-index (SPARK_DASHBOARD_GPU_INDEX, default 0) with graceful out-of-range handling; log NVML device count at startup. Frontend - MemoryCard branches on is_unified: unified hosts keep the existing stacked bar; discrete-GPU hosts render separate system RAM and GPU VRAM sections. - Added tests covering the discrete path and the missing-VRAM fallback. Dev / docs - Rename dev env vars to DEPLOY_USER / DEPLOY_HOST / DEPLOY_DIR. SPARK_* are still accepted as a fallback with a one-line deprecation note so existing .env files keep working. - Generalize README, CONTRIBUTING, dev/README, install.sh, systemd unit, Cargo description, and internal comments away from "DGX Spark only" framing. DGX Spark kept as the original reference point. - Drop the aarch64 warning from packaging/install.sh.
Owner
Author
Increase contrast on engine and hardware section titles, metric labels, gauge labels, and stacked bar legends. Inline the HwCard subtitle next to the title with a middle-dot separator, and expand "GPU Util" to "GPU Utilization".
Thread DeploymentMode through the detector, engine state, and snapshot pipeline so the UI can distinguish a native vLLM process from a containerized one. The engine tab now shows the vLLM logo (plus a Docker logo when applicable) alongside a primary/secondary title pair styled to match the hardware cards.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Project was built targeting the NVIDIA DGX Spark (Grace+Blackwell GB10) and baked a few Spark-specific assumptions into the metrics layer that produced wrong data on any other Linux + NVIDIA GPU host. Fix the load-bearing cases, add multi-GPU awareness, and generalize the framing without renaming the crate.
Load-bearing fixes
Frontend
Dev / docs