Skip to content

Fix metric display for data sizes exceeding TB (#14078)#14079

Merged
winningsix merged 1 commit into
NVIDIA:mainfrom
winningsix:fix-metric-sizeinbytes
Jan 5, 2026
Merged

Fix metric display for data sizes exceeding TB (#14078)#14079
winningsix merged 1 commit into
NVIDIA:mainfrom
winningsix:fix-metric-sizeinbytes

Conversation

@winningsix

@winningsix winningsix commented Jan 4, 2026

Copy link
Copy Markdown
Collaborator

Add PB (Petabyte) and EB (Exabyte) units to SizeInBytes.SizeUnitNames to correctly format very large data sizes in metrics.

Previously, data sizes over 1TB would display with large TB values (e.g., 1024.00TB) instead of proper larger units (e.g., 1.00PB).

Fixes #14078

@greptile-apps

greptile-apps Bot commented Jan 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Extended SizeInBytes.SizeUnitNames array to include "PB" (Petabyte) and "EB" (Exabyte) units, enabling proper formatting of GPU task metrics for data sizes exceeding 1TB. Previously, values >= 1024TB displayed incorrectly (e.g., "1024.00TB") instead of using larger units (e.g., "1.00PB"). The change is a simple one-line fix that extends the array used by the existing formatting logic in the toString() method.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The change is a straightforward extension of an existing string array used for metric display formatting. The existing logic already handles variable-length arrays correctly through the loop condition unitIndex < SizeUnitNames.length. No algorithm changes, no new code paths, and the fix directly addresses the reported issue.
  • No files require special attention

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuTaskMetrics.scala Added PB and EB units to support formatting data sizes exceeding TB

Sequence Diagram

sequenceDiagram
    participant Metric as GPU Task Metric
    participant Accumulator as SizeInBytesAccumulator
    participant SIB as SizeInBytes
    participant Display as Metric Display

    Accumulator->>SIB: Create SizeInBytes(value)
    Note over SIB: value = raw bytes count
    Display->>SIB: toString()
    
    SIB->>SIB: unitVal = value, unitIndex = 0
    
    loop while unitIndex < SizeUnitNames.length && unitVal >= 1024
        SIB->>SIB: nextUnitVal = unitVal >> 10
        SIB->>SIB: remainVal = unitVal - (nextUnitVal << 10)
        SIB->>SIB: unitVal = nextUnitVal
        SIB->>SIB: unitIndex += 1
    end
    
    Note over SIB: OLD: Array("B", "KB", "MB", "GB", "TB")<br/>NEW: Array("B", "KB", "MB", "GB", "TB", "PB", "EB")
    
    SIB->>SIB: finalVal = (unitVal + remainVal/1024).formatted("%.2f")
    SIB->>SIB: result = finalVal + SizeUnitNames(unitIndex) + " (" + value + " bytes)"
    
    SIB-->>Display: Return formatted string
    
    Note over Display: OLD: 1024.00TB for 1PB<br/>NEW: 1.00PB for 1PB
Loading

Add PB (Petabyte) and EB (Exabyte) units to SizeInBytes.SizeUnitNames
to correctly format very large data sizes in metrics.

Previously, data sizes over 1TB would display with large TB values
(e.g., 1024.00TB) instead of proper larger units (e.g., 1.00PB).

Closes NVIDIA#14078

Signed-off-by: Ferdinand Xu <ferdinandx@nvidia.com>
@greptile-apps

greptile-apps Bot commented Jan 4, 2026

Copy link
Copy Markdown
Contributor

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@winningsix winningsix force-pushed the fix-metric-sizeinbytes branch from ebb44ac to edfa025 Compare January 4, 2026 04:05
@winningsix winningsix requested a review from sperlingxx January 4, 2026 05:07
@winningsix

Copy link
Copy Markdown
Collaborator Author

build

@sperlingxx sperlingxx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winningsix winningsix merged commit 3af8279 into NVIDIA:main Jan 5, 2026
45 of 47 checks passed
@winningsix winningsix deleted the fix-metric-sizeinbytes branch January 5, 2026 08:06
@sameerz sameerz added the bug Something isn't working label Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] SizeInBytes metric display incorrect for data sizes exceeding TB

4 participants