ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

kpouget · 2026-01-09T13:29:35Z

This is a follow up of #17072

The API Remoting backend/frontend allow escaping the VM isolation, with the help of the virt-gpu paravirtualization (and the virglrenderer library on the host side).

ggml-remotingfrontend is a GGML API implementation, which intercepts the GGML API calls and forwards them to the virt-gpu virtual device
ggml-remotingbackend is library loaded by virglrenderer (PR will be opened soon for discussion), which opens a GGML library and forwards the call received from virglrenderer.

Here is the context behind this PR:

How we improved AI inference on macOS Podman containers --> the performance of ggml-Vulkan on Mac is 75-80% of ggml-metal
Reach native speed with MacOS llama.cpp container inference --> with API Remoting, the llama.cpp in a VM container runs at nearly 100% of ggml-metal

See the Virglrenderer PR which enables the API Remoting trampoline required in Virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590

this work focused on MacOS, where the in VM/container inference performance are tight to the remoting stack
the code works on Linux. I didn't evaluate thoroughly the performance.
Add support for the APIR capset containers/libkrun#508 --> libkrun VMM patch that allows the routing of the APIR capset to Virglrenderer

Disclaimer: I got helped by Claude Code to finalize this PR. Mostly through pre-submit reviews (no automated C code generation involved). Claude Code did generate the Python code generator (see the *.gen.h and *,gen.c files) used for the backend/frontend RPC (it was generated based on the C/H files I had manually written).

This flag allows disabling the ggml-vulkan backend at runtime. This is necessary for the API Remoting support, as the API Remoting frontend (`ggml-remotingfrontend`) relies on the same device file as `ggml-vulkan`, when running inside a Virtual Machine. This runtime disable flag allows enabling the compilation of both `ggml-vulkan` and `ggml-remotingfrontend`, while selecting at runtime which one should be activated.

taronaeo · 2026-01-10T05:57:04Z

I'll review this in awhile. If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :)

taronaeo

Spacing across the PR is very inconsistent. Please follow 4 spaces and make it consistent.
The vendor files within ggml-remotingfrontend/include - can they be discovered/downloaded separately from the codebase? See:

llama.cpp/CONTRIBUTING.md

Line 65 in 9ac2693

- Avoid adding third-party dependencies, extra files, extra headers, etc.
Inconsistent styling:

__attribute__((unused))
static inline const char *apir_command_name(ApirCommandType type)
{

vs.

static ggml_status ggml_backend_remoting_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) {

Please follow CONTRIBUTING.md: https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md

ggml/src/ggml-remotingfrontend/ggml-backend.cpp

CMakePresets.json

ggml/src/ggml-remotingbackend/shared/apir_cs.h

ggml/src/ggml-remotingbackend/backend-utils.cpp

ggml/src/ggml-remotingbackend/backend-utils.h

taronaeo · 2026-01-10T07:58:57Z

ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp

+struct timer_data graph_compute_timer = {0, 0, 0, "compute_timer"};
+
+uint32_t
+backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) {


Suggested change

backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) {

backend_backend_graph_compute(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx) {

See:

llama.cpp/CONTRIBUTING.md

Line 69 in 9ac2693

- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`

Likewise for the rest of the codebase.

Sorry I may not have been clear. This change should only affect C++ files, while C source and header files can continue to use struct ...

llama.cpp/CONTRIBUTING.md

Lines 71 to 83 in e047f9e

- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`

- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary

```cpp

// OK

llama_context * ctx;

const llama_rope_type rope_type;

// not OK

struct llama_context * ctx;

const enum llama_rope_type rope_type;

```

_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_

I'm not sure if omitting them in C source and header files would break anything for consumers using your backend... If it doesn't break anything then feel free to ignore this comment :)

I think all the files are C++,
and indeed, nothing broke so I think it's safe as is :)

And fix the include order afterwards ...

kpouget · 2026-01-12T16:40:40Z

thanks for the review @taronaeo, I think I followed and fixed all the suggestions

If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :)

yes, would be me indeed :)

taronaeo

Looks a lot better now, thank you for cleaning the code.

I'm still wondering, are the 3rd party vendor files required to be part of GGML/Llama.cpp? (Can they be downloaded separately during development time via a script?)
I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?
#18718 (comment)

I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?

ggml/src/.claude/settings.local.json

ggml/src/ggml-remotingbackend/shared/api_remoting.h

ggml/src/ggml-remotingbackend/CMakeLists.txt

ggml/src/ggml-remotingfrontend/CMakeLists.txt

taronaeo · 2026-01-13T12:12:00Z

ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp

+#include <ostream>
+#include <thread>
+
+int ggml_backend_remoting_get_device_count();


Was this file supposed to be a header file? Looks like the implementation is missing.

was a stalled file, removed

taronaeo · 2026-01-13T12:22:21Z

ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp

+struct timer_data graph_compute_timer = {0, 0, 0, "compute_timer"};
+
+uint32_t
+backend_backend_graph_compute(struct apir_encoder *enc, struct apir_decoder *dec, struct virgl_apir_context *ctx) {


Sorry I may not have been clear. This change should only affect C++ files, while C source and header files can continue to use struct ...

llama.cpp/CONTRIBUTING.md

Lines 71 to 83 in e047f9e

- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`

- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary

```cpp

// OK

llama_context * ctx;

const llama_rope_type rope_type;

// not OK

struct llama_context * ctx;

const enum llama_rope_type rope_type;

```

_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_

I'm not sure if omitting them in C source and header files would break anything for consumers using your backend... If it doesn't break anything then feel free to ignore this comment :)

ggml/src/ggml-remotingbackend/backend-dispatched.cpp

taronaeo · 2026-01-13T12:26:56Z

ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp

+#include "virtgpu-forward-impl.h"
+#include "virtgpu-shm.h"
+
+int apir_device_get_count(virtgpu * gpu) {


I believe this function is public-facing, right?

llama.cpp/CONTRIBUTING.md

Line 70 in e047f9e

- Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets

no it's not public facing, it's only internally consumed.

overall, nothing is public facing, apart from the GGML entrypoints and function tables. So AFAIU, the prototype of all the functions called externally is imposed from the function tables.

The only one not imposed by the tables might be this one:

GGML_BACKEND_API ggml_backend_reg_t ggml_backend_remoting_frontend_reg();

Assisted-by-AI: Claude Code

kpouget · 2026-01-13T13:49:56Z

I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?

sure :)

the blog post has the steps to reproduce it with pre-compiled binaries:
https://developers.redhat.com/articles/2025/09/18/reach-native-speed-macos-llamacpp-container-inference#try_api_remoting_with_ramalama

actually, you should be able to follow the INSTALL steps from my release page:
https://github.com/crc-org/llama.cpp/releases/tag/b7356-remoting-0.3.0

(I'll try to regenerate the binaries before the end of the week)

and this document has the steps to rebuild the different sources, you can request access

happy to discuss it on IBM-RH slack if you need help

kpouget · 2026-01-14T13:02:20Z

For information, I'll be at FOSDEM at the end of the month to present the work behind this PR:
https://fosdem.org/2026/schedule/event/C9NF8K-api_remoting_for_llama_cpp_near-native_gpu_speed_in_macos_containers/

kpouget · 2026-01-14T17:14:52Z

I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?

indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used

ggml: add the ggml-remoting frontend/backend to the build system

d1b7338

kpouget requested a review from ggerganov as a code owner January 9, 2026 13:29

kpouget mentioned this pull request Jan 9, 2026

[RFC] ggml: new backend for API Remoting #17072

Closed

kpouget changed the title ~~ggml: new backend for Virglrenderer API Remoting~~ ggml: new backend for Virglrenderer API Remoting (v2) Jan 9, 2026

kpouget changed the title ~~ggml: new backend for Virglrenderer API Remoting (v2)~~ ggml: new backend for Virglrenderer API Remoting acceleration (v2) Jan 9, 2026

kpouget mentioned this pull request Jan 9, 2026

Add support for the APIR capset containers/libkrun#508

Draft

loci-dev mentioned this pull request Jan 9, 2026

UPSTREAM PR #18718: ggml: new backend for Virglrenderer API Remoting acceleration (v2) auroralabs-loci/llama.cpp#867

Open

github-actions bot added build Compilation issues python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jan 9, 2026

kpouget added 3 commits January 9, 2026 18:37

ggml-remotingfrontend: guest-side backend for API Remoting acceleration

bc1945e

ggml-remotingbackend: host-side backend for Virglrenderer APIR component

abae6ab

kpouget force-pushed the upstream branch from a6ed565 to 9017716 Compare January 9, 2026 17:38

taronaeo self-assigned this Jan 10, 2026

taronaeo reviewed Jan 10, 2026

View reviewed changes

kpouget added 4 commits January 12, 2026 09:00

CMakePresets.json: don't expose presets for the API Remoting backends

3606d0b

backend-utils.cpp: remove unused file

94e7882

Update the indentation with clang-format

b91c645

And fix the include order afterwards ...

Remove FATAL errors in the backend

16f2fa3

taronaeo reviewed Jan 13, 2026

View reviewed changes

kpouget added 7 commits January 13, 2026 13:57

remove structs

3da62b4

Update regenerate_remoting.py to launch clang-format

49f6282

Assisted-by-AI: Claude Code

reformat with clang-format

617eb1f

more cleanups

cb5f2e0

Remove extra header files

e9f085b

Make sure that the LOG messages end with EOL

6976abe

Cleanup the CMakeLists

3219ddb

kpouget added 5 commits January 13, 2026 14:07

Use uint64_t instead of long long

10a1a7f

use (full) upper case for constants

f3e3414

ggml-remoting-frontend.cpp: remove unused file

7ddddef

regenerate_remoting: remove unnecessary import

4c4dac8

regenerate_remoting: appease the linter

a888700

kpouget force-pushed the upstream branch from 1e841ac to a888700 Compare January 13, 2026 13:38

kpouget added 2 commits January 14, 2026 13:02

backend.cpp: use the right variable in error message

de85ce8

ggml-backend-reg: fix typo

4021e94

appaise the linter

7198d88

	backend_backend_graph_compute(struct apir_encoder enc, struct apir_decoder dec, struct virgl_apir_context *ctx) {
	backend_backend_graph_compute(apir_encoder * enc, apir_decoder * dec, virgl_apir_context * ctx) {

	- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`
	- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary
	```cpp
	// OK
	llama_context * ctx;
	const llama_rope_type rope_type;

	// not OK
	struct llama_context * ctx;
	const enum llama_rope_type rope_type;
	```

	_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_

ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

Are you sure you want to change the base?

ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

Uh oh!

Conversation

kpouget commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Jan 10, 2026

Uh oh!

taronaeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpouget Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpouget commented Jan 12, 2026

Uh oh!

taronaeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpouget commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpouget commented Jan 14, 2026

Uh oh!

kpouget commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kpouget commented Jan 9, 2026 •

edited

Loading

taronaeo left a comment •

edited

Loading

kpouget Jan 13, 2026 •

edited

Loading

taronaeo left a comment •

edited

Loading

kpouget commented Jan 13, 2026 •

edited

Loading