-
Notifications
You must be signed in to change notification settings - Fork 368
sync: sync with latest ggml #670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@zhouwg Thanks for this and for your work on QNN. Did you try to run inference on this backend? And how is the performance so far compared to CPU? I'm working on something similar for Android, Local-Diffusion, and I'm looking forward to adding the QNN backend once available |
thanks for your attention with ggml-hexagon(ggml-qnn) for llama.cpp. currently the StableDiffusion inference via Hexagon-cDSP is not supported on Android phone: Your Local-Diffusion project seems very interesting and powerful. the purpose of integrated StableDiffusion.cpp to my on-device AI research project recently is try to fix an opening issue:kantv-ai/kantv#301. all efforts with that issue can be seen in the next commit in that research project. I submitted ggml-hexagon/ggml-qnn PR in upstream llama.cpp community on 03/11/2025, unfortunately, it seems that there are no positive feedback with that PR and I don't know why. I also hope ggml-hexagon backend could be available in the upstream llama.cpp. |
@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that. And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max? |
this is a real good question. another WIP QNN backend is a hard-forked candidate PR from a Chinese C++ programmer base on my original PR on 04/26/2024. pls refer to some tech docs/posts in project ggml-hexagon to understand more tech details about ggml-hexagon: https://github.com/zhouwg/ggml-hexagon/discussions
stable-diffusion supportive on cDSP is not difficult because I'm busy working on that opening issue recently. I'll add stable-diffusion supportive on cDSP later (I guess the performance should be poorer than the default ggml backend because of some tech&non-tech factors) after I merge the PR of integrate stablediffusion.cpp for realtime text-2-image in online-TV scenario on Android phone in that research project. |
thanks for your review and correction! I'll refine it accordingly. |
you are correct! I added stable-diffusion inference on cDSP in this PR:kantv-ai/kantv#307 |
sync with latest ggml and integrate the amazing stable-diffusion.cpp to a standard Android APP for purpose of text-2-image on Android phone.
validated on x86-Linux and Android phone equipped with Snapdragon 8Gen3&8Elite.
btw, I suggest that all internal and public non-static functions can be added with prefix "sd_".