Open
Description
Hi there, you made fantastic framework for llms. But what I find very confusing is how to run this on cuda and direct ml. I simply don't know how to do it in C#..
I there any example? Second question is, do I have to provide different model per cuda, cpu and directml or can it run seamlessly? Or is there a way to convert model to support all or combination of providers? as far as I know onnx it self provides seamles support that's why it's a bit confusing.
My use case is to deploy to user's device a model and based on his capabilities to choose the provider which can provide best performance. But not in the opposite direction, because I expect my user to know nothing about the ML it self.
Thank you ✌️