Skip to content

Commit 2537ea8

Browse files
Skip DCGM exporter on non-NVIDIA nodes
1 parent 466339c commit 2537ea8

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

src/swiss_ai_model_launch/assets/template.jinja

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -223,12 +223,12 @@ fi
223223
# Launch DCGM exporter on the batch node to expose GPU metrics on port 9400.
224224
# vmagent scrape config should include a job targeting localhost:9400 to collect
225225
# these metrics alongside the framework metrics.
226-
if [ -x "$DCGM_EXPORTER_BIN" ]; then
226+
if [ -e /dev/nvidia0 ] && [ -x "$DCGM_EXPORTER_BIN" ]; then
227227
"$DCGM_EXPORTER_BIN" \
228-
--address 0.0.0.0:9400 \
228+
--address 0.0.0.0:9400 -f /capstor/store/cscs/swissai/infra01/ocf-share/default-counters.csv \
229229
> /tmp/dcgm-exporter-${SLURM_JOB_ID}.log 2>&1 &
230230
else
231-
echo "dcgm-exporter: $DCGM_EXPORTER_BIN not found, skipping" >&2
231+
echo "dcgm-exporter: no NVIDIA GPU or binary not found, skipping" >&2
232232
fi
233233

234234
# Optional router launch

0 commit comments

Comments
 (0)