[Feature Request] Distributed cuda x86 but not aarch64 linux

### Describe the feature request

I've seen that onnxruntime distribute with cuda x86 but not for aarch64 cuda. Like spark, thor etc

<img width="1433" height="734" alt="Image" src="https://github.com/user-attachments/assets/a99ab01f-14b8-4276-af58-2516b15b544f" />

### Describe scenario use case

DGX Spark, Thor, GB200, GH200 etc