Skip to content

Add support for tensor transfers in eager to allow for multi-device execution#1157

Open
sogartar wants to merge 1 commit into
mainfrom
users/sogartar/eager-tensor-transfer
Open

Add support for tensor transfers in eager to allow for multi-device execution#1157
sogartar wants to merge 1 commit into
mainfrom
users/sogartar/eager-tensor-transfer

Conversation

@sogartar

Copy link
Copy Markdown

Currently we only issue device transfer ops when exporting. With this change a new export-device-affinity-to-torch-device configuration map is introduced that allows us to do actual transfers in eager.

t = torch.tensor([1, 2], device="cuda:2")
with IreeDeviceAffinityToTorchDevice({
    DeviceAffinity(0): torch.device("cuda:2"),
    DeviceAffinity(1): torch.device("cuda:3")
}):
    t2 = transfer_to_logical_device("1", t) # transfer to cuda:3
    t3 = transfer_to_logical_device("0", t2) # transfer back to cuda:2

…xecution

Currently we only issue device transfer ops when exporting.
With this change a new export-device-affinity-to-torch-device configuration map
is introduced that allows us to do actual transfers in eager.

```
t = torch.tensor([1, 2], device="cuda:2")
with IreeDeviceAffinityToTorchDevice({
    DeviceAffinity(0): torch.device("cuda:2"),
    DeviceAffinity(1): torch.device("cuda:3")
}):
    t2 = transfer_to_logical_device("1", t) # transfer to cuda:3
    t3 = transfer_to_logical_device("0", t2) # transfer back to cuda:2
```

Signed-off-by: Boian Petkantchin <boian.petkantchin@amd.com>
Comment thread iree/turbine/ops/iree.py
Comment on lines +170 to +172
################################################################################
# IREE device affinity to torch device map
################################################################################

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I should move this section somewhere else. For example in iree/turbine/runtime/device.py.

@rsuderman rsuderman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a solution in search of a problem. Device affinity is predominately used for when tracing and will essentially be disconnected from device level tracing. E.g. when tracing for 8 devices this could be done a a single cpu torch instance as devices are not required at tracing time.

Requiring the device affinity and tensor placements are aligned / correct will likely just generate more upkeep for a feature that is not needed.

]


@dataclass(frozen=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, device affinity is not a dataclass. You should only use this annotation when the type is struct like.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is DeviceTensorTrait bellow a dataclass?

@dataclass
class DeviceTensorTrait:

It is pretty much the same thing.

@sogartar

sogartar commented Sep 30, 2025

Copy link
Copy Markdown
Author

This looks like a solution in search of a problem.

@rsuderman, it is true that currently the Llama 405b f4 model fits on a single Mi355 instance, but before we had models that we wanted to run eagerly but could not. We also may want to run the f16 variant, which would not fit. It is likely that when the next big model comes we would not be able to fit it on a single GPU. This feature is not about tracing, but when running eagerly.

How do you suggest we enable running our models on multiple devices eagerly?

@sogartar sogartar requested a review from rsuderman September 30, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants