-
Notifications
You must be signed in to change notification settings - Fork 14.5k
ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This flag allows disabling the ggml-vulkan backend at runtime. This is necessary for the API Remoting support, as the API Remoting frontend (`ggml-remotingfrontend`) relies on the same device file as `ggml-vulkan`, when running inside a Virtual Machine. This runtime disable flag allows enabling the compilation of both `ggml-vulkan` and `ggml-remotingfrontend`, while selecting at runtime which one should be activated.
|
I'll review this in awhile. If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Spacing across the PR is very inconsistent. Please follow 4 spaces and make it consistent.
- The vendor files within
ggml-remotingfrontend/include- can they be discovered/downloaded separately from the codebase? See:Line 65 in 9ac2693
- Avoid adding third-party dependencies, extra files, extra headers, etc. - Inconsistent styling:
__attribute__((unused))
static inline const char *apir_command_name(ApirCommandType type)
{vs.
static ggml_status ggml_backend_remoting_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) {Please follow CONTRIBUTING.md: https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md
| struct timer_data graph_compute_timer = {0, 0, 0, "compute_timer"}; | ||
|
|
||
| uint32_t | ||
| backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) { | |
| backend_backend_graph_compute(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx) { |
See:
Line 69 in 9ac2693
| - Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a` |
Likewise for the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not have been clear. This change should only affect C++ files, while C source and header files can continue to use struct ...
Lines 71 to 83 in e047f9e
| - Declare structs with `struct foo {}` instead of `typedef struct foo {} foo` | |
| - In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary | |
| ```cpp | |
| // OK | |
| llama_context * ctx; | |
| const llama_rope_type rope_type; | |
| // not OK | |
| struct llama_context * ctx; | |
| const enum llama_rope_type rope_type; | |
| ``` | |
| _(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_ |
I'm not sure if omitting them in C source and header files would break anything for consumers using your backend... If it doesn't break anything then feel free to ignore this comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the files are C++,
and indeed, nothing broke so I think it's safe as is :)
|
thanks for the review @taronaeo, I think I followed and fixed all the suggestions
yes, would be me indeed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a lot better now, thank you for cleaning the code.
- I'm still wondering, are the 3rd party vendor files required to be part of GGML/Llama.cpp? (Can they be downloaded separately during development time via a script?)
- I'm not sure if I missed it, but I don't see the required
GGML_BACKEND_DL_IMPLmacro call in this PR. Did GGML register your backend correctly? - #18718 (comment)
I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?
| #include <ostream> | ||
| #include <thread> | ||
|
|
||
| int ggml_backend_remoting_get_device_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this file supposed to be a header file? Looks like the implementation is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was a stalled file, removed
| struct timer_data graph_compute_timer = {0, 0, 0, "compute_timer"}; | ||
|
|
||
| uint32_t | ||
| backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I may not have been clear. This change should only affect C++ files, while C source and header files can continue to use struct ...
Lines 71 to 83 in e047f9e
| - Declare structs with `struct foo {}` instead of `typedef struct foo {} foo` | |
| - In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary | |
| ```cpp | |
| // OK | |
| llama_context * ctx; | |
| const llama_rope_type rope_type; | |
| // not OK | |
| struct llama_context * ctx; | |
| const enum llama_rope_type rope_type; | |
| ``` | |
| _(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_ |
I'm not sure if omitting them in C source and header files would break anything for consumers using your backend... If it doesn't break anything then feel free to ignore this comment :)
| #include "virtgpu-forward-impl.h" | ||
| #include "virtgpu-shm.h" | ||
|
|
||
| int apir_device_get_count(virtgpu * gpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this function is public-facing, right?
Line 70 in e047f9e
| - Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it's not public facing, it's only internally consumed.
overall, nothing is public facing, apart from the GGML entrypoints and function tables. So AFAIU, the prototype of all the functions called externally is imposed from the function tables.
The only one not imposed by the tables might be this one:
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_remoting_frontend_reg();
sure :) the blog post has the steps to reproduce it with pre-compiled binaries: actually, you should be able to follow the (I'll try to regenerate the binaries before the end of the week) and this document has the steps to rebuild the different sources, you can request access happy to discuss it on IBM-RH slack if you need help |
|
For information, I'll be at FOSDEM at the end of the month to present the work behind this PR: |
indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used |
This is a follow up of #17072
The API Remoting backend/frontend allow escaping the VM isolation, with the help of the
virt-gpuparavirtualization (and thevirglrendererlibrary on the host side).ggml-remotingfrontendis a GGML API implementation, which intercepts the GGML API calls and forwards them to thevirt-gpuvirtual deviceggml-remotingbackendis library loaded byvirglrenderer(PR will be opened soon for discussion), which opens a GGML library and forwards the call received fromvirglrenderer.Here is the context behind this PR:
See the Virglrenderer PR which enables the API Remoting trampoline required in Virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590
this work focused on MacOS, where the in VM/container inference performance are tight to the remoting stack
the code works on Linux. I didn't evaluate thoroughly the performance.
Add support for the APIR capset containers/libkrun#508 --> libkrun VMM patch that allows the routing of the APIR capset to Virglrenderer
Disclaimer: I got helped by Claude Code to finalize this PR. Mostly through pre-submit reviews (no automated C code generation involved). Claude Code did generate the Python code generator (see the
*.gen.hand*,gen.cfiles) used for the backend/frontend RPC (it was generated based on the C/H files I had manually written).