Avoid unnecessary copy in TensorSource #8849

lsy323 · 2025-03-18T21:38:57Z

Avoid at::Tensor copy in TensorSource if it's not necessary.

The copy operations are needed under 2 cases:

On XLA:GPU path, if tensor is on CUDA device, need to copy to CPU then pass to PJRT runtime to transfer to GPU. This is because @ysiraichi found passing CUDA tensor to PJRT runtime doesn't work, so doing the roundtrip as a workaround.
On XLA:TPU path, if the tensor is not contiguous, need to use the copy to make the memory contiguous. Because PJRT takes raw data ptr which expects data to be contiguous.

The copy operation needs to be blocking, since the transfer operation depends on the copied tensor.

lsy323 · 2025-03-19T17:43:37Z

Hi @ysiraichi, just follow up on offline discussion on the copy operation. PTAL at the PR, thanks!

torch_xla/csrc/runtime/tensor_source.h

ysiraichi

As a side note, we can use the DLPack machinery for doing the CUDA to XLA:CUDA transfer (that wasn't implemented at the time I worked on this). I will open an issue for this.

ysiraichi · 2025-03-19T21:02:20Z

torch_xla/csrc/runtime/tensor_source.h

+    // The purposes of copy are:
+    // 1. Ensure the memory is contiguous, which is expected by PJRT.
+    // 2. Move CUDA tensor to CPU since we cannot pass CUDA memory to PJRT now.
+    // 3. Cast data type.
+    // We can avoid if copy is not needed.
+    if (tensor.device() == at::kCPU && tensor.is_contiguous() &&
+        tensor.dtype() == target_torch_type) {
+      tensor_ = std::move(tensor);
+    } else {
+      // TODO(ysiraichi): check, first, if tensor lives in a device that the
+      // current PjRt client has access. If so, we don't need to go through the
+      // CPU.
+      tensor_ = std::move(tensor.to(
+          at::TensorOptions().device(at::kCPU).dtype(target_torch_type),
+          /*non_blocking=*/false,
+          /*copy=*/true, at::MemoryFormat::Contiguous));
+    }


As far as I understand it, tensor.to(...) (without the copy argument) already checks whether it should actually copy or not. So, what do you think of reverting to the old tensor.to(...) usage, but removing the copy argument, instead?

Hi @ysiraichi, I didn't find a tensor.to(...) without the copy arg in C++, is it only in python?

You are right. But, I think we can just /* copy= */false.

torch_xla/csrc/runtime/tensor_source.h

ysiraichi · 2025-03-20T12:25:30Z

In the tensor.to(...) implementation, we can see that if copy=false, using the old implementation, that function will check whether we need to actually copy the tensor. No need to do it ourselves.

pgmoka

Left one small NIT. Otherwise, LGTM. If you could address the NIT before submission, that might be nice.

pgmoka · 2025-03-20T21:26:29Z

torch_xla/csrc/runtime/tensor_source.h

+      tensor_ = std::move(tensor);
+    } else {
+      TORCH_LAZY_COUNTER("AtenSourceTensorCopy", 1);
+      // TODO(ysiraichi): check, first, if tensor lives in a device that the


NIT: I personally prefer to have TODOs linked to issues, and then having those assigned to people. That way, things can be more easily followed-up if a contributor is no longer active

lsy323 · 2025-03-24T22:38:55Z

@ysiraichi @tengyifei Actually took a 2nd thought on this. Skipping the copy seems to be unsafe if the underlying PJRT transfer is async.

def gen_rand_on_xla():
  t = torch.rand(100000)
  t_xla = t.to('xla') # Async transfer start
  return t_xla

my_rand_on_xla = gen_rand_on_xla()
# After the above line, async transfer may not be done, but CPU tensor is already out of scope and may be gc'ed.

ysiraichi · 2025-03-25T11:12:21Z

Hmm... I don't think this is the case. That's because PyTorch tensors are ref-counted in the C++ side (unless otherwise specified). So, if we hold a C++ Tensor type data, that won't be deleted.

ysiraichi · 2025-03-25T11:13:48Z

By the way, I still don't think you need the if-else there. I believe you can just leave the old tensor.to(...), while setting copy=false. PyTorch should verify whether a copy is needed for us.

ysiraichi

LGTM.

lsy323 · 2025-03-26T20:58:07Z

By the way, I still don't think you need the if-else there. I believe you can just leave the old tensor.to(...), while setting copy=false. PyTorch should verify whether a copy is needed for us.

Thank you so much @ysiraichi for the suggestion! Updated accordingly.

This reverts commit 8dc5b49.

lsy323 changed the title ~~avoid unnecessary copy in tensorsource~~ Avoid unnecessary copy in TensorSource Mar 19, 2025

avoid unnecessary copy in tensorsource

495cb2f

lsy323 force-pushed the lsiyuan/avoid-blocking-copy-tensorsource branch 2 times, most recently from 2b3b31f to 636a787 Compare March 19, 2025 17:24

remove print

e483f51

lsy323 force-pushed the lsiyuan/avoid-blocking-copy-tensorsource branch from 636a787 to e483f51 Compare March 19, 2025 17:25

lsy323 marked this pull request as ready for review March 19, 2025 17:26

lsy323 requested review from ysiraichi and yaochengji March 19, 2025 17:26

yaochengji reviewed Mar 19, 2025

View reviewed changes

torch_xla/csrc/runtime/tensor_source.h Outdated Show resolved Hide resolved

avoid .to if not needed

407fb1c

lsy323 requested review from tengyifei, qihqi, bhavya01 and pgmoka March 19, 2025 18:44

ysiraichi reviewed Mar 19, 2025

View reviewed changes

tengyifei reviewed Mar 20, 2025

View reviewed changes

torch_xla/csrc/runtime/tensor_source.h Outdated Show resolved Hide resolved

tengyifei requested changes Mar 20, 2025

View reviewed changes

torch_xla/csrc/runtime/tensor_source.h Outdated Show resolved Hide resolved

torch_xla/csrc/runtime/tensor_source.h Outdated Show resolved Hide resolved

lsy323 added 2 commits March 20, 2025 02:29

add comment, tests

eeb715e

add comment

7b0689a

tengyifei approved these changes Mar 20, 2025

View reviewed changes

fix cuda test

888b0a4

pgmoka approved these changes Mar 20, 2025

View reviewed changes

lsy323 added 2 commits March 25, 2025 17:20

set copy as false

a11744a

remove testg

2909950

revert cuda test

170d8b0

ysiraichi approved these changes Mar 26, 2025

View reviewed changes

lsy323 merged commit 8dc5b49 into master Mar 26, 2025
23 checks passed

jeffhataws added a commit that referenced this pull request Jun 17, 2025

Revert "Avoid unnecessary copy in TensorSource (#8849)"

0c2c622

This reverts commit 8dc5b49.

jeffhataws added a commit that referenced this pull request Jun 17, 2025

Revert "Avoid unnecessary copy in TensorSource (#8849)"

fd8a1b4

This reverts commit 8dc5b49.

pgmoka mentioned this pull request Jun 18, 2025

[v2.8] Neuron inference trace analyzer/bucketing unit tests hanging at GetParameterIdTensorMapping/TransferFromDevice #9378

Open

jeffhataws added a commit that referenced this pull request Jun 18, 2025

Revert "Avoid unnecessary copy in TensorSource (#8849)" (#9379)

efdf117

Avoid unnecessary copy in TensorSource #8849

Avoid unnecessary copy in TensorSource #8849

Uh oh!

Conversation

lsy323 commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lsy323 commented Mar 19, 2025

Uh oh!

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

ysiraichi Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

lsy323 Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

ysiraichi Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysiraichi commented Mar 20, 2025

Uh oh!

pgmoka left a comment

Choose a reason for hiding this comment

Uh oh!

pgmoka Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

lsy323 commented Mar 24, 2025

Uh oh!

ysiraichi commented Mar 25, 2025

Uh oh!

ysiraichi commented Mar 25, 2025

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lsy323 commented Mar 26, 2025

Uh oh!

Uh oh!

lsy323 commented Mar 18, 2025 •

edited

Loading