Skip to content

Commit 78def66

Browse files
authored
log if maxMem is called twice in a task (#13454)
### Description This change adds an error log line for if `GpuTaskMetrics.updateMaxMemory` gets called more than once for a given task. This will help to debug #13336 ### Checklists No behavioral change. - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Zach Puller <zpuller@nvidia.com>
1 parent 08c59f7 commit 78def66

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuTaskMetrics.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ class AvgLongAccumulator extends AccumulatorV2[jl.Long, jl.Double] {
207207
} else 0;
208208
}
209209

210-
class GpuTaskMetrics extends Serializable {
210+
class GpuTaskMetrics extends Serializable with Logging {
211211
private val semaphoreHoldingTime = new NanoSecondAccumulator
212212
private val semWaitTimeNs = new NanoSecondAccumulator
213213
private val retryCount = new LongAccumulator
@@ -404,6 +404,9 @@ class GpuTaskMetrics extends Serializable {
404404
// once on task completion, whereas the actual logic tracking of the max value during memory
405405
// allocations lives in the JNI. Therefore, we can stick the convention here of calling the
406406
// add method instead of adding a dedicated max method to the accumulator.
407+
if (maxDeviceMemoryBytes.value.value > 0) {
408+
logError(s"updateMaxMemory called twice for task $taskAttemptId with maxMem $maxMem")
409+
}
407410
maxDeviceMemoryBytes.add(maxMem)
408411
}
409412
if (maxHostBytesAllocated > 0) {

0 commit comments

Comments
 (0)