-
Notifications
You must be signed in to change notification settings - Fork 356
FEAT: (NVIDIA) support multiple gpus status individually #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughNVIDIA-specific output formatting in three GPU monitoring scripts was changed to emit pipe-delimited strings (leading and trailing pipes) via awk; several normalization/printf steps were removed while control flow and non‑NVIDIA branches remain unchanged. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)scripts/gpu_ram_info.sh (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/gpu_usage.sh (1)
46-46: Verify compatibility withnormalize_percent_lenfor pipe-delimited format.The change from a plain percentage to pipe-delimited format
|<percentage>%|is appropriate for multi-GPU support. Confirm that thenormalize_percent_lenfunction (line 54) correctly handles the new format with embedded pipes for multiple values.Additionally, consider adding error handling if
nvidia-smifails or returns no output, as the script currently has no guard against malformed input.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
scripts/gpu_power.sh(1 hunks)scripts/gpu_ram_info.sh(1 hunks)scripts/gpu_usage.sh(1 hunks)
🔇 Additional comments (2)
scripts/gpu_power.sh (1)
47-47: Clarify output format:echo "Power"inclusion in usage variable may create malformed output.Lines 47 and 49 include
echo "Power"in the command chain that sets theusagevariable. This produces a multi-line output where "Power" appears on the first line and the pipe-delimited metrics on subsequent lines. When echoed inmain()(line 74), this produces:GPU Power\n|<values>|, which differs from the pipe-delimited format in other scripts.Is the "Power" label intentional as part of the usage output, or should it be handled separately? Please verify this matches the intended output format shown in the PR screenshots.
Also applies to: 49-49
scripts/gpu_ram_info.sh (1)
45-45: Add explicit field separator for CSV parsing in awk to improve robustness and clarity.Line 45 uses awk without specifying a field separator when parsing CSV output from nvidia-smi. While the implicit numeric coercion of
$0currently works, explicitly setting-F ', *'and using$1instead of$0follows CSV parsing best practices and prevents fragility if input format varies. Apply the suggested diff to improve code maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
scripts/gpu_power.sh (1)
47-47: Fix awk field reference in power percentage calculation.Line 47 uses
$0 / $2 * 100in the awk expression, but with-F ', *'field separator,$0is the entire input line. The percentage should divide the first field (power draw) by the second field (power limit). Change$0 / $2 * 100to$1 / $2 * 100.This issue was flagged in a prior review but the
$0reference was not corrected.- usage=$(echo "Power" && nvidia-smi --query-gpu=power.draw,power.limit --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $0 / $2 * 100) }' && echo "|") + usage=$(echo "Power" && nvidia-smi --query-gpu=power.draw,power.limit --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $1 / $2 * 100) }' && echo "|")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/gpu_power.sh(1 hunks)scripts/gpu_ram_info.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gpu_ram_info.sh (1)
scripts/utils.sh (2)
normalize_percent_len(26-35)get_tmux_option(3-12)
🪛 Shellcheck (0.11.0)
scripts/gpu_ram_info.sh
[warning] 50-50: used_accuracy appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 51-51: total_accuracy appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (2)
scripts/gpu_power.sh (1)
49-49: Field references in wattage format are correct.Line 49 correctly uses
$1and$2in the printf format string. No changes needed here.scripts/gpu_ram_info.sh (1)
50-51: Verify variable usage forused_accuracyandtotal_accuracy.Lines 50–51 assign
used_accuracyandtotal_accuracyfrom tmux options. Confirm whether these variables are referenced after these lines (particularly in line 52's printf format). If unused, remove the assignments; if intended for formatting, apply them to the output.
Relevant PRs
|
Theoreticallyhugo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much for your contribution.
looking good in general, with some kinks that need smoothing.
dracula#369 (comment) Co-authored-by: Hugo/ Theo Meinhof <[email protected]>
Co-authored-by: Hugo/ Theo Meinhof <[email protected]>
Co-authored-by: Hugo/ Theo Meinhof <[email protected]>
dracula#369 (comment) Co-authored-by: Hugo/ Theo Meinhof <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
scripts/gpu_ram_info.sh (2)
44-47: Fix critical awk field reference and control flow bugs.Line 45 has a critical arithmetic error:
$0 / $2divides the entire line (e.g., "1024, 2048") by the second field, which will produce incorrect results. With the-F ', *'separator,$0is the whole line while$1is the used memory and$2is the total. Change to$1 / $2 * 100.Line 47's
exit 0terminates the entire script, preventingmain()from executing line 75 where the label and usage are echoed to the user. Replace withreturn 0to exit only the function.🔎 Apply this diff to fix both issues:
if $gpu_vram_percent; then - usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $0 / $2 * 100) }' && echo "|") + usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $1 / $2 * 100) }' && echo "|") echo $usage - exit 0 + return 0
52-52: Fix awk field reference in GB formatting.Line 52 uses
$0 / 1024which divides the entire line (e.g., "1024, 2048") by 1024 instead of just the first field. With-F ', *', change to$1 / 1024to correctly convert the used memory value.🔎 Apply this diff to fix the field reference:
- usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%dGB/%dGB", $0 / 1024, $2 / 1024) }' && echo "|") + usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%dGB/%dGB", $1 / 1024, $2 / 1024) }' && echo "|")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/gpu_power.sh(1 hunks)scripts/gpu_ram_info.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gpu_ram_info.sh (1)
scripts/utils.sh (1)
get_tmux_option(3-12)
🪛 Shellcheck (0.11.0)
scripts/gpu_ram_info.sh
[warning] 50-50: used_accuracy appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 51-51: total_accuracy appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (1)
scripts/gpu_power.sh (1)
49-49: LGTM!Line 49 correctly uses
$1and$2to format the power draw and limit values for each GPU.
Hello!
I want to suggest to show gpus status independently from the experience that each gpu runs different jobs, if there are multiple gpus. (especially when users share the machine)
Please refer the attached image below.
It still support a one gpu as well.
Any additional suggestions are welcome.
Thank you.