Closed
Description
Describe the bug
Correctly loads the model when I am in the Unity play mode but when compiled it reverts to CPU and is too slow to load. I've tried to clean the projects, re-import LLMUnity, I've tried taking the .dlls from the streamed folders with the build and placing them with the compiled .exe but nothing seems to work !
May be worth noting I've enabled 'extras' since it's an iquant model (LLama3 Based )
No errors in the build log or warnings!
LLMUnity version
v2.4.1
Operating System
Windows
Specs:
Windows 11 LTSC
Unity 6
4070 12gb VRAM ( GGUF model is only 2gb)
32GB DDR4
note; I also tried installing the specific version of CUDA locally but I don't think this is the issue ( and it did not resolve the issue with the windows build )