Skip to content

sync: sync with latest ggml #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhouwg
Copy link

@zhouwg zhouwg commented May 5, 2025

sync with latest ggml and integrate the amazing stable-diffusion.cpp to a standard Android APP for purpose of text-2-image on Android phone.

validated on x86-Linux and Android phone equipped with Snapdragon 8Gen3&8Elite.

713992135

btw, I suggest that all internal and public non-static functions can be added with prefix "sd_".

@rmatif
Copy link

rmatif commented May 5, 2025

@zhouwg Thanks for this and for your work on QNN. Did you try to run inference on this backend? And how is the performance so far compared to CPU?

I'm working on something similar for Android, Local-Diffusion, and I'm looking forward to adding the QNN backend once available

@zhouwg
Copy link
Author

zhouwg commented May 5, 2025

thanks for your attention with ggml-hexagon(ggml-qnn) for llama.cpp. currently the StableDiffusion inference via Hexagon-cDSP is not supported on Android phone:

666295660.

Your Local-Diffusion project seems very interesting and powerful. the purpose of integrated StableDiffusion.cpp to my on-device AI research project recently is try to fix an opening issue:kantv-ai/kantv#301. all efforts with that issue can be seen in the next commit in that research project.

I submitted ggml-hexagon/ggml-qnn PR in upstream llama.cpp community on 03/11/2025, unfortunately, it seems that there are no positive feedback with that PR and I don't know why. I also hope ggml-hexagon backend could be available in the upstream llama.cpp.

@rmatif
Copy link

rmatif commented May 5, 2025

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

@zhouwg
Copy link
Author

zhouwg commented May 5, 2025

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

this is a real good question. another WIP QNN backend is a hard-forked candidate PR from a Chinese C++ programmer base on my original PR on 04/26/2024.

pls refer to some tech docs/posts in project ggml-hexagon to understand more tech details about ggml-hexagon: https://github.com/zhouwg/ggml-hexagon/discussions

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

stable-diffusion supportive on cDSP is not difficult because I'm busy working on that opening issue recently. I'll add stable-diffusion supportive on cDSP later (I guess the performance should be poorer than the default ggml backend because of some tech&non-tech factors) after I merge the PR of integrate stablediffusion.cpp for realtime text-2-image in online-TV scenario on Android phone in that research project.

@idostyle
Copy link
Contributor

idostyle commented May 5, 2025

Isn't this missing the actual ggml sync? Something like in the screenshot should be shown in diff, right?
Screen Shot 2025-05-05 at 15 25 14

@zhouwg
Copy link
Author

zhouwg commented May 6, 2025

thanks for your review and correction! I'll refine it accordingly.

@zhouwg
Copy link
Author

zhouwg commented May 6, 2025

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

you are correct!

I added stable-diffusion inference on cDSP in this PR:kantv-ai/kantv#307
unfortunately, stable-diffusion inference on cDSP cann't works correctly as expected because the backed-scheduler feature in llama.cpp is not supported in stable-diffusion.cpp currently, pls refer to:#671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants