[BUG] Integer overflow / truncation risks across the codebase

This report was generated by claude-4.6-opus-high

## Context

PR #14466 fixed integer overflow in `KudoGpuTableOperator.concat` where:
1. `.map(...).sum` (Int) was replaced with `.foldLeft(0L)` (Long) for byte-size computation
2. `8 * (n + 1)` (Int multiply) was changed to `8L * (n + 1)`
3. `var currentOffset = 0` (Int) was changed to `0L` (Long)

A codebase-wide audit found additional locations susceptible to the same class of bug. All involve `Int` arithmetic on values that represent byte sizes, byte offsets, or row counts that can exceed `Int.MaxValue` (~2 GB / ~2 billion rows).

---

## HIGH Severity

### 1. `GpuShuffleCoalesceExec.scala` — `map(getNumRows).sum` as Int (3 locations)

Lines 351, 400, 428 — the zero-column branch of all three `concat` methods:

```scala
// Line 351 — JCudfTableOperator.concat
val totalRowsNum = tables.map(getNumRows).sum
cudf_utils.HostConcatResultUtil.rowsOnlyHostConcatResult(totalRowsNum)

// Line 400 — KudoTableOperator.concat
val totalRowsNum = columns.map(getNumRows).sum
RowCountOnlyMergeResult(totalRowsNum)

// Line 428 — KudoGpuTableOperator.concat (same method fixed by #14466, but the numCols==0 branch)
val totalRowsNum = columns.map(getNumRows).sum
new ColumnarBatch(Array.empty, totalRowsNum)
```

`getNumRows` returns `Int`; `.sum` is `Int`. Silently wraps to a wrong row count with many small batches. The `canAddToBatch` guard (line 615) limits `numRowsInBatch` but the sum in `concat` doesn't benefit from that guard directly.

### 2. `GpuBroadcastExchangeExec.scala` — broadcast row-count sum

```scala
// ~Line 582
numRows = withResource(buffers) { _ =>
  ...
  buffers.map(_.header.getNumRows).sum
}
```

Same `.map(...).sum` on `Int` row counts in the broadcast path.

### 3. `MultithreadedShuffleBufferCatalog.scala` — `size().toInt` truncation

```scala
// Line 255-263
override def size(): Long = segments.map(_.length).sum

override def nioByteBuffer(): ByteBuffer = {
  ...
  val totalSize = size().toInt   // <-- truncation
  val buffer = ByteBuffer.allocate(totalSize)
```

`size()` returns `Long`, but `nioByteBuffer()` calls `.toInt` — a shuffle block larger than 2 GB gets a truncated or negative allocation size.

### 4. `GpuTextBasedPartitionReader.scala` — Int multiply for offsets + `.toInt` truncation

```scala
// ~Line 175-213
private var offsetsBuffer = HostMemoryBuffer.allocate(
  (rowsAllocated + 1) * DType.INT32.getSizeInBytes)  // <-- Int * Int overflow
...
offsetsBuffer.setInt(
  (numRows + 1) * DType.INT32.getSizeInBytes.toLong,
  dataLocation.toInt)  // <-- truncation for data > 2GB
```

`(rowsAllocated + 1) * 4` is pure `Int` multiplication — overflows for large row counts. `dataLocation.toInt` truncates when data exceeds 2 GB.

### 5. `GpuParquetScan.scala` — footer size `.toInt` before buffer allocation

```scala
// ~Line 597-598
val hmbLength = (fileLen - footerIndex).toInt  // <-- truncation
closeOnExcept(HostMemoryBuffer.allocate(hmbLength + MAGIC.length, false)) { outBuffer =>
```

If the footer span exceeds 2 GB, `.toInt` truncates, allocating a wrong-sized buffer.

### 6. `GpuPartitioning.scala` — `getLength.toInt` / `getLong(...).toInt` for large buffers

```scala
// ~Line 236-252
val idx = offsetsHost.getLong((i) * elemSize).toInt  // <-- truncation
...
new SlicedSerializedColumnVector(dataHost, start, dataHost.getLength.toInt)  // <-- truncation
```

Truncation of `Long` buffer positions/lengths to `Int` when serialized data exceeds 2 GB.

---

## MEDIUM Severity

### 7. `GpuParquetScan.scala` — `calculateExtraMemoryForParquetFooter` pure Int arithmetic

```scala
// ~Line 1615-1617
def calculateExtraMemoryForParquetFooter(numCols: Int, numBlocks: Int): Int = {
  val numColumnChunks = numCols * numBlocks   // <-- Int overflow
  numColumnChunks * 2 * 8                     // <-- Int overflow
```

`numCols * numBlocks * 16` — all `Int` — overflows with very wide tables and many row groups.

### 8. `RapidsHostColumnBuilder.java` — Int bit-shift for offset indexing

```java
// ~Line 615
data.setLong(currentIndex++ << bitShiftBySize, value);
```

`currentIndex << bitShiftBySize` is 32-bit int shift; wraps if the logical byte offset exceeds `Integer.MAX_VALUE`.

### 9. `ParquetCachedBatchSerializer.scala` — `var pos = 0` (Int) exposed as `getPos: Long`

```scala
// ~Line 141-161
new DelegatingPositionOutputStream(stream) {
  var pos = 0                          // <-- Int
  override def getPos: Long = pos      // <-- returned as Long
  override def write(b: Int): Unit = {
    super.write(b)
    pos += Integer.BYTES               // <-- wraps past 2GB
  }
```

`pos` tracks byte position as `Int` but is returned as `Long` via `getPos`. Wraps for output exceeding 2 GB.

### 10. `GpuOrcScan.scala` — ORC footer cache `hmb.getLength.toInt`

```scala
// ~Line 1796-1800
val bb = ByteBuffer.allocate(hmb.getLength.toInt)        // <-- truncation
hmb.getBytes(bb.array(), 0, 0, hmb.getLength.toInt)      // <-- truncation
```

Footer cache HMB length truncated to `Int`.

### 11. `UCXConnection.scala` (shuffle-plugin) — `rkeys.map(_.capacity).sum` as Int

```scala
// ~Line 411-414
val size = java.lang.Long.BYTES + java.lang.Integer.BYTES +
    (java.lang.Integer.BYTES * rkeys.size) +  // <-- Int * Int
    rkeys.map(_.capacity).sum                 // <-- Int .sum
```

Pure `Int` arithmetic for computing handshake buffer size.

---

## LOW Severity (bounded by design or unlikely to trigger)

| Location | Notes |
|----------|-------|
| `GpuParquetScan.scala:3030,3042,3598` | `.map(_.getRowCount).sum.toInt` — `getRowCount` is `Long`, `.sum` is `Long`, `.toInt` truncates but row count per block is typically bounded |
| `GpuMultiFileReader.scala:759-831` | `.map(_.bytes).sum` — `bytes` is `Long`, `.sum` is `Long` (safe) |
| `GpuShuffleCoalesceExec.scala:504` | `numRowsInBatch: Int` — guarded by `canAddToBatch` check at line 615 |
| `GpuAggregateExec.scala:263,1080` | `.map(_.sizeInBytes).sum` — `sizeInBytes` is `Long` (safe) |
| `GpuColumnarBatchSerializer.scala:126` | `HostMemoryBuffer.allocate(header.getDataLen)` — `getDataLen` is `Long` (safe) |
| `RapidsDeletionVectorStore.scala:130` | `size: Int` passed from Delta API (API limitation, not arithmetic bug) |
| UCX `size.toInt` calls | `UCXShuffleTransport.scala:103`, `UCXConnection.scala:169`, `UCX.scala:870,1030` — metadata payloads typically small |

---

## Suggested fix patterns

- **`.map(f).sum`** → **`.foldLeft(0L)((acc, x) => acc + f(x))`** or **`.map(f.toLong).sum`** to force `Long` accumulation
- **`Int * Int`** for sizes → use `Long` literal: **`8L * n`** or **`n.toLong * m`**
- **`var offset = 0`** for byte tracking → **`var offset = 0L`**
- **`.toInt`** on buffer sizes → add bounds check / throw on overflow, or redesign API to use `Long`


Location	Notes
`GpuParquetScan.scala:3030,3042,3598`	`.map(_.getRowCount).sum.toInt` — `getRowCount` is `Long`, `.sum` is `Long`, `.toInt` truncates but row count per block is typically bounded
`GpuMultiFileReader.scala:759-831`	`.map(_.bytes).sum` — `bytes` is `Long`, `.sum` is `Long` (safe)
`GpuShuffleCoalesceExec.scala:504`	`numRowsInBatch: Int` — guarded by `canAddToBatch` check at line 615
`GpuAggregateExec.scala:263,1080`	`.map(_.sizeInBytes).sum` — `sizeInBytes` is `Long` (safe)
`GpuColumnarBatchSerializer.scala:126`	`HostMemoryBuffer.allocate(header.getDataLen)` — `getDataLen` is `Long` (safe)
`RapidsDeletionVectorStore.scala:130`	`size: Int` passed from Delta API (API limitation, not arithmetic bug)
UCX `size.toInt` calls	`UCXShuffleTransport.scala:103`, `UCXConnection.scala:169`, `UCX.scala:870,1030` — metadata payloads typically small

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Integer overflow / truncation risks across the codebase #14471

Context

HIGH Severity

1. `GpuShuffleCoalesceExec.scala` — `map(getNumRows).sum` as Int (3 locations)

2. `GpuBroadcastExchangeExec.scala` — broadcast row-count sum

3. `MultithreadedShuffleBufferCatalog.scala` — `size().toInt` truncation

4. `GpuTextBasedPartitionReader.scala` — Int multiply for offsets + `.toInt` truncation

5. `GpuParquetScan.scala` — footer size `.toInt` before buffer allocation

6. `GpuPartitioning.scala` — `getLength.toInt` / `getLong(...).toInt` for large buffers

MEDIUM Severity

7. `GpuParquetScan.scala` — `calculateExtraMemoryForParquetFooter` pure Int arithmetic

8. `RapidsHostColumnBuilder.java` — Int bit-shift for offset indexing

9. `ParquetCachedBatchSerializer.scala` — `var pos = 0` (Int) exposed as `getPos: Long`

10. `GpuOrcScan.scala` — ORC footer cache `hmb.getLength.toInt`

11. `UCXConnection.scala` (shuffle-plugin) — `rkeys.map(_.capacity).sum` as Int

LOW Severity (bounded by design or unlikely to trigger)

Suggested fix patterns

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Integer overflow / truncation risks across the codebase #14471

Description

Context

HIGH Severity

1. GpuShuffleCoalesceExec.scala — map(getNumRows).sum as Int (3 locations)

2. GpuBroadcastExchangeExec.scala — broadcast row-count sum

3. MultithreadedShuffleBufferCatalog.scala — size().toInt truncation

4. GpuTextBasedPartitionReader.scala — Int multiply for offsets + .toInt truncation

5. GpuParquetScan.scala — footer size .toInt before buffer allocation

6. GpuPartitioning.scala — getLength.toInt / getLong(...).toInt for large buffers

MEDIUM Severity

7. GpuParquetScan.scala — calculateExtraMemoryForParquetFooter pure Int arithmetic

8. RapidsHostColumnBuilder.java — Int bit-shift for offset indexing

9. ParquetCachedBatchSerializer.scala — var pos = 0 (Int) exposed as getPos: Long

10. GpuOrcScan.scala — ORC footer cache hmb.getLength.toInt

11. UCXConnection.scala (shuffle-plugin) — rkeys.map(_.capacity).sum as Int

LOW Severity (bounded by design or unlikely to trigger)

Suggested fix patterns

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `GpuShuffleCoalesceExec.scala` — `map(getNumRows).sum` as Int (3 locations)

2. `GpuBroadcastExchangeExec.scala` — broadcast row-count sum

3. `MultithreadedShuffleBufferCatalog.scala` — `size().toInt` truncation

4. `GpuTextBasedPartitionReader.scala` — Int multiply for offsets + `.toInt` truncation

5. `GpuParquetScan.scala` — footer size `.toInt` before buffer allocation

6. `GpuPartitioning.scala` — `getLength.toInt` / `getLong(...).toInt` for large buffers

7. `GpuParquetScan.scala` — `calculateExtraMemoryForParquetFooter` pure Int arithmetic

8. `RapidsHostColumnBuilder.java` — Int bit-shift for offset indexing

9. `ParquetCachedBatchSerializer.scala` — `var pos = 0` (Int) exposed as `getPos: Long`

10. `GpuOrcScan.scala` — ORC footer cache `hmb.getLength.toInt`

11. `UCXConnection.scala` (shuffle-plugin) — `rkeys.map(_.capacity).sum` as Int