sync: sync with latest ggml #670

zhouwg · 2025-05-05T07:25:15Z

sync with latest ggml and integrate the amazing stable-diffusion.cpp to a standard Android APP for purpose of text-2-image on Android phone.

validated on x86-Linux and Android phone equipped with Snapdragon 8Gen3&8Elite.

btw, I suggest that all internal and public non-static functions can be added with prefix "sd_".

rmatif · 2025-05-05T08:54:51Z

@zhouwg Thanks for this and for your work on QNN. Did you try to run inference on this backend? And how is the performance so far compared to CPU?

I'm working on something similar for Android, Local-Diffusion, and I'm looking forward to adding the QNN backend once available

zhouwg · 2025-05-05T09:59:32Z

thanks for your attention with ggml-hexagon(ggml-qnn) for llama.cpp. currently the StableDiffusion inference via Hexagon-cDSP is not supported on Android phone:

.

Your Local-Diffusion project seems very interesting and powerful. the purpose of integrated StableDiffusion.cpp to my on-device AI research project recently is try to fix an opening issue:kantv-ai/kantv#301. all efforts with that issue can be seen in the next commit in that research project.

I submitted ggml-hexagon/ggml-qnn PR in upstream llama.cpp community on 03/11/2025, unfortunately, it seems that there are no positive feedback with that PR and I don't know why. I also hope ggml-hexagon backend could be available in the upstream llama.cpp.

rmatif · 2025-05-05T10:26:04Z

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

zhouwg · 2025-05-05T10:43:02Z

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

this is a real good question. another WIP QNN backend is a hard-forked candidate PR from a Chinese C++ programmer base on my original PR on 04/26/2024.

pls refer to some tech docs/posts in project ggml-hexagon to understand more tech details about ggml-hexagon: https://github.com/zhouwg/ggml-hexagon/discussions

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

stable-diffusion supportive on cDSP is not difficult because I'm busy working on that opening issue recently. I'll add stable-diffusion supportive on cDSP later (I guess the performance should be poorer than the default ggml backend because of some tech&non-tech factors) after I merge the PR of integrate stablediffusion.cpp for realtime text-2-image in online-TV scenario on Android phone in that research project.

idostyle · 2025-05-05T13:26:28Z

Isn't this missing the actual ggml sync? Something like in the screenshot should be shown in diff, right?

zhouwg · 2025-05-06T02:42:30Z

thanks for your review and correction! I'll refine it accordingly.

zhouwg · 2025-05-06T03:45:40Z

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

you are correct!

I added stable-diffusion inference on cDSP in this PR:kantv-ai/kantv#307
unfortunately, stable-diffusion inference on cDSP cann't works correctly as expected because the backed-scheduler feature in llama.cpp is not supported in stable-diffusion.cpp currently, pls refer to:#671

sync: sync with latest ggml

602d9d1

sync: refine according to review comment

271ccfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: sync with latest ggml #670

sync: sync with latest ggml #670

zhouwg commented May 5, 2025 •

edited

Loading

rmatif commented May 5, 2025

zhouwg commented May 5, 2025 •

edited

Loading

rmatif commented May 5, 2025

zhouwg commented May 5, 2025 •

edited

Loading

idostyle commented May 5, 2025

zhouwg commented May 6, 2025 •

edited

Loading

zhouwg commented May 6, 2025 •

edited

Loading

sync: sync with latest ggml #670

Are you sure you want to change the base?

sync: sync with latest ggml #670

Conversation

zhouwg commented May 5, 2025 • edited Loading

rmatif commented May 5, 2025

zhouwg commented May 5, 2025 • edited Loading

rmatif commented May 5, 2025

zhouwg commented May 5, 2025 • edited Loading

idostyle commented May 5, 2025

zhouwg commented May 6, 2025 • edited Loading

zhouwg commented May 6, 2025 • edited Loading

zhouwg commented May 5, 2025 •

edited

Loading

zhouwg commented May 5, 2025 •

edited

Loading

zhouwg commented May 5, 2025 •

edited

Loading

zhouwg commented May 6, 2025 •

edited

Loading

zhouwg commented May 6, 2025 •

edited

Loading