Skip to content

Conversation

@dev-onejun
Copy link
Contributor

Hello!

I want to suggest to show gpus status independently from the experience that each gpu runs different jobs, if there are multiple gpus. (especially when users share the machine)

Please refer the attached image below.

Screenshot from 2025-12-09 12-11-24

It still support a one gpu as well.

Screenshot from 2025-12-09 12-23-38

Any additional suggestions are welcome.

Thank you.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

📝 Walkthrough

Walkthrough

NVIDIA-specific output formatting in three GPU monitoring scripts was changed to emit pipe-delimited strings (leading and trailing pipes) via awk; several normalization/printf steps were removed while control flow and non‑NVIDIA branches remain unchanged.

Changes

Cohort / File(s) Summary
GPU power formatting — NVIDIA
scripts/gpu_power.sh
NVIDIA branch now uses awk -F ', *' to emit pipe-delimited output. When gpu_power_percent is enabled it prints `
GPU RAM formatting — NVIDIA
scripts/gpu_ram_info.sh
NVIDIA path now emits pipe-delimited strings. With gpu_vram_percent enabled it prints `
GPU utilization formatting — NVIDIA
scripts/gpu_usage.sh
NVIDIA utilization now emits per-GPU segments using `awk -F ', *' '{ printf("

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify downstream consumers/parsers handle the new leading and trailing pipe format.
  • Check awk field selections and percent/division logic for division-by-zero or malformed input.
  • Confirm trailing-pipe vs newline behavior consistency across scripts.
  • Re-evaluate removal of normalize_percent_len/printf for alignment or display regressions.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding support for displaying multiple NVIDIA GPUs' status individually, which aligns with the changes to gpu_power.sh, gpu_ram_info.sh, and gpu_usage.sh that now output per-GPU metrics in pipe-delimited format.
Description check ✅ Passed The description is related to the changeset and explains the motivation: showing GPU status independently for multiple GPUs on shared machines, with visual examples demonstrating the feature for both multi-GPU and single-GPU scenarios.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 781cdb5 and 9c4f3d4.

📒 Files selected for processing (1)
  • scripts/gpu_ram_info.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gpu_ram_info.sh (1)
scripts/utils.sh (1)
  • get_tmux_option (3-12)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/gpu_usage.sh (1)

46-46: Verify compatibility with normalize_percent_len for pipe-delimited format.

The change from a plain percentage to pipe-delimited format |<percentage>%| is appropriate for multi-GPU support. Confirm that the normalize_percent_len function (line 54) correctly handles the new format with embedded pipes for multiple values.

Additionally, consider adding error handling if nvidia-smi fails or returns no output, as the script currently has no guard against malformed input.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b762b3 and 57539be.

📒 Files selected for processing (3)
  • scripts/gpu_power.sh (1 hunks)
  • scripts/gpu_ram_info.sh (1 hunks)
  • scripts/gpu_usage.sh (1 hunks)
🔇 Additional comments (2)
scripts/gpu_power.sh (1)

47-47: Clarify output format: echo "Power" inclusion in usage variable may create malformed output.

Lines 47 and 49 include echo "Power" in the command chain that sets the usage variable. This produces a multi-line output where "Power" appears on the first line and the pipe-delimited metrics on subsequent lines. When echoed in main() (line 74), this produces: GPU Power\n|<values>|, which differs from the pipe-delimited format in other scripts.

Is the "Power" label intentional as part of the usage output, or should it be handled separately? Please verify this matches the intended output format shown in the PR screenshots.

Also applies to: 49-49

scripts/gpu_ram_info.sh (1)

45-45: Add explicit field separator for CSV parsing in awk to improve robustness and clarity.

Line 45 uses awk without specifying a field separator when parsing CSV output from nvidia-smi. While the implicit numeric coercion of $0 currently works, explicitly setting -F ', *' and using $1 instead of $0 follows CSV parsing best practices and prevents fragility if input format varies. Apply the suggested diff to improve code maintainability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
scripts/gpu_power.sh (1)

47-47: Fix awk field reference in power percentage calculation.

Line 47 uses $0 / $2 * 100 in the awk expression, but with -F ', *' field separator, $0 is the entire input line. The percentage should divide the first field (power draw) by the second field (power limit). Change $0 / $2 * 100 to $1 / $2 * 100.

This issue was flagged in a prior review but the $0 reference was not corrected.

-      usage=$(echo "Power" && nvidia-smi --query-gpu=power.draw,power.limit --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $0 / $2 * 100) }' && echo "|")
+      usage=$(echo "Power" && nvidia-smi --query-gpu=power.draw,power.limit --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $1 / $2 * 100) }' && echo "|")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57539be and 306debc.

📒 Files selected for processing (2)
  • scripts/gpu_power.sh (1 hunks)
  • scripts/gpu_ram_info.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gpu_ram_info.sh (1)
scripts/utils.sh (2)
  • normalize_percent_len (26-35)
  • get_tmux_option (3-12)
🪛 Shellcheck (0.11.0)
scripts/gpu_ram_info.sh

[warning] 50-50: used_accuracy appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 51-51: total_accuracy appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (2)
scripts/gpu_power.sh (1)

49-49: Field references in wattage format are correct.

Line 49 correctly uses $1 and $2 in the printf format string. No changes needed here.

scripts/gpu_ram_info.sh (1)

50-51: Verify variable usage for used_accuracy and total_accuracy.

Lines 50–51 assign used_accuracy and total_accuracy from tmux options. Confirm whether these variables are referenced after these lines (particularly in line 52's printf format). If unused, remove the assignments; if intended for formatting, apply them to the output.

@dev-onejun
Copy link
Contributor Author

Relevant PRs

Copy link
Collaborator

@Theoreticallyhugo Theoreticallyhugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much for your contribution.
looking good in general, with some kinks that need smoothing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
scripts/gpu_ram_info.sh (2)

44-47: Fix critical awk field reference and control flow bugs.

Line 45 has a critical arithmetic error: $0 / $2 divides the entire line (e.g., "1024, 2048") by the second field, which will produce incorrect results. With the -F ', *' separator, $0 is the whole line while $1 is the used memory and $2 is the total. Change to $1 / $2 * 100.

Line 47's exit 0 terminates the entire script, preventing main() from executing line 75 where the label and usage are echoed to the user. Replace with return 0 to exit only the function.

🔎 Apply this diff to fix both issues:
    if $gpu_vram_percent; then
-      usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $0 / $2 * 100) }' && echo "|")
+      usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%d%%", $1 / $2 * 100) }' && echo "|")
    echo $usage
-    exit 0
+    return 0

52-52: Fix awk field reference in GB formatting.

Line 52 uses $0 / 1024 which divides the entire line (e.g., "1024, 2048") by 1024 instead of just the first field. With -F ', *', change to $1 / 1024 to correctly convert the used memory value.

🔎 Apply this diff to fix the field reference:
-      usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%dGB/%dGB", $0 / 1024, $2 / 1024) }' && echo "|")
+      usage=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F ', *' '{ printf("|%dGB/%dGB", $1 / 1024, $2 / 1024) }' && echo "|")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc7262f and 5621ddd.

📒 Files selected for processing (2)
  • scripts/gpu_power.sh (1 hunks)
  • scripts/gpu_ram_info.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gpu_ram_info.sh (1)
scripts/utils.sh (1)
  • get_tmux_option (3-12)
🪛 Shellcheck (0.11.0)
scripts/gpu_ram_info.sh

[warning] 50-50: used_accuracy appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 51-51: total_accuracy appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (1)
scripts/gpu_power.sh (1)

49-49: LGTM!

Line 49 correctly uses $1 and $2 to format the power draw and limit values for each GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants