### Describe the feature request I've seen that onnxruntime distribute with cuda x86 but not for aarch64 cuda. Like spark, thor etc <img width="1433" height="734" alt="Image" src="https://github.com/user-attachments/assets/a99ab01f-14b8-4276-af58-2516b15b544f" /> ### Describe scenario use case DGX Spark, Thor, GB200, GH200 etc