Add support for tensor transfers in eager to allow for multi-device execution#1157
Add support for tensor transfers in eager to allow for multi-device execution#1157sogartar wants to merge 1 commit into
Conversation
…xecution
Currently we only issue device transfer ops when exporting.
With this change a new export-device-affinity-to-torch-device configuration map
is introduced that allows us to do actual transfers in eager.
```
t = torch.tensor([1, 2], device="cuda:2")
with IreeDeviceAffinityToTorchDevice({
DeviceAffinity(0): torch.device("cuda:2"),
DeviceAffinity(1): torch.device("cuda:3")
}):
t2 = transfer_to_logical_device("1", t) # transfer to cuda:3
t3 = transfer_to_logical_device("0", t2) # transfer back to cuda:2
```
Signed-off-by: Boian Petkantchin <boian.petkantchin@amd.com>
| ################################################################################ | ||
| # IREE device affinity to torch device map | ||
| ################################################################################ |
There was a problem hiding this comment.
I am not sure if I should move this section somewhere else. For example in iree/turbine/runtime/device.py.
rsuderman
left a comment
There was a problem hiding this comment.
This looks like a solution in search of a problem. Device affinity is predominately used for when tracing and will essentially be disconnected from device level tracing. E.g. when tracing for 8 devices this could be done a a single cpu torch instance as devices are not required at tracing time.
Requiring the device affinity and tensor placements are aligned / correct will likely just generate more upkeep for a feature that is not needed.
| ] | ||
|
|
||
|
|
||
| @dataclass(frozen=True) |
There was a problem hiding this comment.
No, device affinity is not a dataclass. You should only use this annotation when the type is struct like.
There was a problem hiding this comment.
Why is DeviceTensorTrait bellow a dataclass?
@dataclass
class DeviceTensorTrait:
It is pretty much the same thing.
@rsuderman, it is true that currently the Llama 405b f4 model fits on a single Mi355 instance, but before we had models that we wanted to run eagerly but could not. We also may want to run the f16 variant, which would not fit. It is likely that when the next big model comes we would not be able to fit it on a single GPU. This feature is not about tracing, but when running eagerly. How do you suggest we enable running our models on multiple devices eagerly? |
Currently we only issue device transfer ops when exporting. With this change a new export-device-affinity-to-torch-device configuration map is introduced that allows us to do actual transfers in eager.