HI, @saudet
For me, AOTIModelPackageLoader, AOT Cuda, and the profiler are critical factors for future inference acceleration. I am confident that we can use javacpp-pytorch to refactor a large model inference engine that rivals the performance of vllm, sglang, and deepspeed. A large model inference engine based on Java will one day be a tool on par with Apache Spark and Flink. However, if you think this is too ahead of its time, of course, if you consider that many Java programmers are still quite初级 (beginner), they are still stuck at the level of how to use tensor or matrix calculations. In any case, from the perspective of ecosystem completeness, AOTIModelPackageLoader, AOT Cuda, and the profiler are indispensable for javacpp-pytorch. If you can implement this, I am willing to pay 200 US dollars. This is not just for me; it is truly for the benefit of the Java, Scala, and Kotlin ecosystems. I believe that in the future, more major companies will also consider these aspects. With AOTIModelPackageLoader, AOT Cuda, and the profiler, I even feel that javacpp-pytorch would not need updates for the next five years, because it would truly be complete, without the need for other new, breakthrough features.
For everyone, these topics are already very profound, and many people may not even encounter them, or have never even considered them. However, from the perspective of accelerating inference, they are essential requirements.