Handle non-contiguous Tensors based GPU transfer #52548

srinathk10 · 2025-04-23T01:16:17Z

Why are these changes needed?

Handle non-contiguous Tensors based GPU transfer. This allows removing the overhead of combining Arrow chunked arrays during Arrow -> Numpy -> Tensor conversion.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Srinath Krishnamachari <[email protected]>

python/ray/data/iterator.py

raulchen · 2025-04-23T23:29:11Z

python/ray/data/iterator.py

-                        batch = batch.to(device=device)
-                return batch
-
+            collate_fn = DefaultNumpyCollateFn(


The default should be arrow for better performance.
And we don't need the default numpy / pandas.

Yes, fixed now. Changed to Numpy to chase down the issue of null->nan conversion.

raulchen · 2025-04-23T23:30:30Z

python/ray/data/iterator.py

+        elif isinstance(collate_fn, PandasBatchCollateFn):
+            batch_format = "pandas"
+        elif callable(collate_fn):
+            batch_format = "numpy"


let's add a warning that raw collate_fn will be deprecated. and suggest using ArrowBatchCollateFn for the best performance.

i think we should still keep raw collate fn? (and just default to the numpy version).

yeah, we should keep it. but I'd like to emit a warning to ask users to migrate to the new API.
WDYT?

raulchen · 2025-04-23T23:50:15Z

python/ray/data/iterator.py

@@ -301,17 +585,17 @@ def iter_torch_batches(
                Dataset is passed to Ray Train and ``collate_fn`` is not provided.
                Otherwise, defaults to CPU. You can't use this parameter with
                ``collate_fn``.
-            collate_fn: A function to convert a Numpy batch to a PyTorch tensor batch.
+            collate_fn: A function to convert a PyArrow Table or Numpy batch to PyTorch tensors.


I'd like to rewrite this doc-string and make it more descriptive and easy to read.
E.g.

collate_fn: A function that collates data batches before feeding them to the model. Potential use cases include ... If not specified, `iter_torch_batches` converts the data to torch.Tensors and transfer them to the given device. If specified, you can customize the collation logic. The input batch type can be one of .... Arrow is the most recommended for best perf. And the output can be any type. If the output type is one of ..., the data will be automatically transferred to the given device. Otherwise, you need to transfer the batches in your training loop. Note, this collate_fn will be called in a multi-threaded manner.

cc @justinvyu for more suggestions

and the above example needs to be updated as well

also need to mention how to choose the batch type - subclassing one of ...

""" collate_fn: [Alpha] A function to customize how data batches are collated before being passed to the model. This is useful for last-mile data formatting such as padding, masking, or packaging tensors into custom data structures. If not provided, `iter_torch_batches` automatically converts batches to `torch.Tensor`s and moves them to the device assigned to the current worker. The input to `collate_fn` may be: (1) dict of np.ndarray, where you should provide a function that takes in a dict of Numpy arrays (2) pd.DataFrame, where you should provide a callable class that subclasses `PandasCollateFn` (3) pyarrow.Table, where you should provide a callable class that subclasses `ArrowCollateFn` (recommended for best performance) The output can be any type. If the output is a `torch.Tensor`, `dict[str, torch.Tensor]`, or `list/tuple[torch.Tensor]`, it will be automatically moved to the current worker's device. For other types, you must handle device transfer manually in your training loop. Note: This function is called in a multi-threaded context; avoid using thread-unsafe code. """

Signed-off-by: Srinath Krishnamachari <[email protected]>

…-gpu-transfer Signed-off-by: srinathk10 <[email protected]>

Signed-off-by: Srinath Krishnamachari <[email protected]>

…-gpu-transfer

Signed-off-by: Srinath Krishnamachari <[email protected]>

python/ray/data/iterator.py

justinvyu · 2025-04-24T18:22:12Z

python/ray/data/iterator.py

@@ -301,17 +585,17 @@ def iter_torch_batches(
                Dataset is passed to Ray Train and ``collate_fn`` is not provided.
                Otherwise, defaults to CPU. You can't use this parameter with
                ``collate_fn``.
-            collate_fn: A function to convert a Numpy batch to a PyTorch tensor batch.
+            collate_fn: A function to convert a PyArrow Table or Numpy batch to PyTorch tensors.


""" collate_fn: [Alpha] A function to customize how data batches are collated before being passed to the model. This is useful for last-mile data formatting such as padding, masking, or packaging tensors into custom data structures. If not provided, `iter_torch_batches` automatically converts batches to `torch.Tensor`s and moves them to the device assigned to the current worker. The input to `collate_fn` may be: (1) dict of np.ndarray, where you should provide a function that takes in a dict of Numpy arrays (2) pd.DataFrame, where you should provide a callable class that subclasses `PandasCollateFn` (3) pyarrow.Table, where you should provide a callable class that subclasses `ArrowCollateFn` (recommended for best performance) The output can be any type. If the output is a `torch.Tensor`, `dict[str, torch.Tensor]`, or `list/tuple[torch.Tensor]`, it will be automatically moved to the current worker's device. For other types, you must handle device transfer manually in your training loop. Note: This function is called in a multi-threaded context; avoid using thread-unsafe code. """

justinvyu · 2025-04-24T18:23:17Z

python/ray/data/iterator.py

+        elif isinstance(collate_fn, PandasBatchCollateFn):
+            batch_format = "pandas"
+        elif callable(collate_fn):
+            batch_format = "numpy"


i think we should still keep raw collate fn? (and just default to the numpy version).

Signed-off-by: srinathk10 <[email protected]>

Signed-off-by: Srinath Krishnamachari <[email protected]>

…unked-gpu-transfer

Signed-off-by: Srinath Krishnamachari <[email protected]>

…unked-gpu-transfer

Signed-off-by: Srinath Krishnamachari <[email protected]>

Signed-off-by: srinathk10 <[email protected]>

Signed-off-by: Srinath Krishnamachari <[email protected]>

srinathk10 and others added 10 commits April 23, 2025 01:12

WIP: Handle non-contiguous Tensors GPU transfer

35edb58

Signed-off-by: Srinath Krishnamachari <[email protected]>

Lint

f5d83ea

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-chunked-gpu-transfer

0cd3a21

Lint Fixes

309b72f

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc fixes

1c465dd

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc fixes

d90b8da

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc fixes

700e7fe

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-chunked-gpu-transfer

c29dd57

Misc fixes

3b90c2c

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc Fixes

678cf7d

Signed-off-by: Srinath Krishnamachari <[email protected]>

justinvyu self-assigned this Apr 23, 2025

raulchen reviewed Apr 23, 2025

View reviewed changes

srinathk10 and others added 5 commits April 24, 2025 03:42

Handle Arrow Array null types in to_numpy

ed5c31d

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc Fixes

2f3933d

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc fixes

1ac0f4a

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-chunked-gpu-transfer

1b35c3b

Merge branch 'master' into srinathk10-to_numpy-null-types

fc413df

srinathk10 changed the base branch from master to srinathk10-to_numpy-null-types April 24, 2025 04:16

srinathk10 and others added 2 commits April 23, 2025 21:18

Merge branch 'srinathk10-to_numpy-null-types' into srinathk10-chunked…

1360a85

…-gpu-transfer Signed-off-by: srinathk10 <[email protected]>

Lint

2a0b3b3

Signed-off-by: Srinath Krishnamachari <[email protected]>

srinathk10 added the go add ONLY when ready to merge, run all tests label Apr 24, 2025

srinathk10 and others added 4 commits April 24, 2025 16:29

Fixes

b1d1835

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-to_numpy-null-types

30f86e0

Merge branch 'srinathk10-to_numpy-null-types' into srinathk10-chunked…

d4960a6

…-gpu-transfer

Misc Fixes

2bb01f3

Signed-off-by: Srinath Krishnamachari <[email protected]>

justinvyu reviewed Apr 24, 2025

View reviewed changes

Base automatically changed from srinathk10-to_numpy-null-types to master April 24, 2025 19:11

srinathk10 and others added 3 commits April 24, 2025 13:33

Merge branch 'master' into srinathk10-chunked-gpu-transfer

859228f

Signed-off-by: srinathk10 <[email protected]>

Misc Fixes

f0e8a25

Signed-off-by: Srinath Krishnamachari <[email protected]>

Train release test: Enable multiprocess spawn (CUDA compatability)

6219895

Signed-off-by: Srinath Krishnamachari <[email protected]>

srinathk10 added 2 commits April 25, 2025 05:48

Fixes

817b5be

Signed-off-by: Srinath Krishnamachari <[email protected]>

Misc Fixes

9bc89a0

Signed-off-by: Srinath Krishnamachari <[email protected]>

srinathk10 changed the base branch from master to srinathk10-train-release-test-fixes April 25, 2025 06:45

srinathk10 and others added 7 commits April 24, 2025 23:46

Merge branch 'srinathk10-train-release-test-fixes' into srinathk10-ch…

0cb8a67

…unked-gpu-transfer

Misc fixes

e2333c3

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-train-release-test-fixes

c22efa9

Lint

0b2debe

Signed-off-by: Srinath Krishnamachari <[email protected]>

Fixes

2ed3665

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-train-release-test-fixes

c8e6947

Merge branch 'srinathk10-train-release-test-fixes' into srinathk10-ch…

cc77476

…unked-gpu-transfer

srinathk10 changed the base branch from srinathk10-train-release-test-fixes to master April 25, 2025 23:16

srinathk10 and others added 2 commits April 25, 2025 16:16

Merge branch 'master' into srinathk10-chunked-gpu-transfer

d2e626d

Addressed review comments

e4a760b

Signed-off-by: Srinath Krishnamachari <[email protected]>

srinathk10 changed the title ~~WIP: Handle non-contiguous Tensors based GPU transfer~~ Handle non-contiguous Tensors based GPU transfer Apr 28, 2025

srinathk10 marked this pull request as ready for review April 28, 2025 20:27

srinathk10 requested a review from a team as a code owner April 28, 2025 20:27

srinathk10 and others added 4 commits April 28, 2025 13:30

Merge branch 'master' into srinathk10-chunked-gpu-transfer

acf2be2

Signed-off-by: srinathk10 <[email protected]>

Misc fixes

5d4d7a1

Signed-off-by: Srinath Krishnamachari <[email protected]>

Lint

02a8e7c

Signed-off-by: Srinath Krishnamachari <[email protected]>

Merge branch 'master' into srinathk10-chunked-gpu-transfer

13ddd5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle non-contiguous Tensors based GPU transfer #52548

Handle non-contiguous Tensors based GPU transfer #52548

srinathk10 commented Apr 23, 2025

raulchen Apr 23, 2025

srinathk10 Apr 24, 2025

raulchen Apr 23, 2025

justinvyu Apr 24, 2025

raulchen Apr 24, 2025

raulchen Apr 23, 2025

raulchen Apr 23, 2025

raulchen Apr 23, 2025

justinvyu Apr 24, 2025

justinvyu Apr 24, 2025

justinvyu Apr 24, 2025

Handle non-contiguous Tensors based GPU transfer #52548

Are you sure you want to change the base?

Handle non-contiguous Tensors based GPU transfer #52548

Conversation

srinathk10 commented Apr 23, 2025

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment