ci : discuss optimization strategies #20446

ggerganov · 2026-03-12T08:20:14Z

ggerganov
Mar 12, 2026
Maintainer

CISC · 2026-03-12T08:21:55Z

CISC
Mar 12, 2026
Collaborator

I think we also need to look into minimizing (however possible without compromising test coverage) the amount of builds running.

1 reply

ggerganov Mar 12, 2026
Maintainer Author

Yes, the total runtime also keeps increasing. Though I am less concern about it. The queue time is what we have much less control of.

ggerganov · 2026-03-12T08:22:01Z

ggerganov
Mar 12, 2026
Maintainer Author

How do we feel about disabling all automatic Github-hosted workflows for Pull Requests and delegate it to maintainers to manually decide which workflows to run and when for a given PR? The master branch continues to run all workflows by default.

6 replies

danbev Mar 12, 2026
Maintainer

I think this sounds reasonable and worth doing 👍

ggerganov Mar 12, 2026
Maintainer Author

We'll need to separate the workflow definitions so that self-hosted workflows are in separate .yml files from the non-self-hosted.

aldehir Mar 12, 2026
Collaborator

Maybe a minimal workflow for Ubuntu/macOS/Windows cpu-only builds to run tests? Then when the PR is refined enough, manually trigger the entire suite prior to merging.

JohannesGaessler Mar 13, 2026
Collaborator

Is it possible to create lists of workflows that are specific to e.g. the CUDA backend so that backend maintainers can simply press a button to test only the changed backend?

CISC Mar 13, 2026
Collaborator

No simple button I think, to do that we'd have to separate the workflows and you would have to manually specify which branch it should run on (but the PR branch refs don't show up in the list, not sure if it's possible?).

CISC · 2026-03-12T11:54:07Z

CISC
Mar 12, 2026
Collaborator

I wonder if our main issue is that we have many long running ubuntu runnners (ubuntu-24-cmake-vulkan in particular) saturating availability of it and causing a long backlog...

2 replies

ggerganov Mar 12, 2026
Maintainer Author

Yes, you might be right. This workflow seems to take too long:

https://github.com/ggml-org/llama.cpp/actions/runs/22963567444/job/66660728706

CISC Mar 12, 2026
Collaborator

Oh, BTW, we need to update our ccache-action:
https://github.com/ggml-org/llama.cpp/actions/runs/22963567444/job/66660728706#annotation:19:2

netrunnereve · 2026-03-12T17:24:49Z

netrunnereve
Mar 12, 2026
Collaborator

Well I was the one who set up the ubuntu-24-cmake-vulkan runner, it's super slow since it basically emulates a GPU on CPU to run the Vulkan tests. I've optimized it already but ultimately it's not going to go any faster because of the slow machines and the growing number of tests. Maybe it's possible to get it running on ARM but I've never gotten that to work. While it has its uses (mainly in finding interesting bugs that don't pop up in a real GPU, and also so that it can be run in forks) considering we have the CI machines with real GPUs now we can get rid of it if you feel it's too slow.

We also have way too many jobs in general. Aside from removing jobs you can also try spreading out the load between ARM and x86 machines if possible, some stuff like those cross compiles or webgpu runs can probably be done on ARM. There's also the new ubuntu-slim machine which they hopefully have more of and which we can use for simple jobs that only need 1 core and 5 gigs of memory.

1 reply

CISC Mar 12, 2026
Collaborator

Well I was the one who set up the ubuntu-24-cmake-vulkan runner, it's super slow since it basically emulates a GPU on CPU to run the Vulkan tests. I've optimized it already but ultimately it's not going to go any faster because of the slow machines and the growing number of tests. Maybe it's possible to get it running on ARM but I've never gotten that to work. While it has its uses (mainly in finding interesting bugs that don't pop up in a real GPU, and also so that it can be run in forks) considering we have the CI machines with real GPUs now we can get rid of it if you feel it's too slow.

We can keep it, but I think this is one we will have to either only run on Vulkan changes and/or manually.

We also have way too many jobs in general. Aside from removing jobs you can also try spreading out the load between ARM and x86 machines if possible, some stuff like those cross compiles or webgpu runs can probably be done on ARM.

Indeed, spreading them across different arches may help.

There's also the new ubuntu-slim machine which they hopefully have more of and which we can use for simple jobs that only need 1 core and 5 gigs of memory.

Yep, already transitioned a few jobs to that. :)

ggerganov · 2026-03-13T10:25:50Z

ggerganov
Mar 13, 2026
Maintainer Author

I changed the setting to require Github Actions approval for all external contributors:

I'm just not sure who will be able to make the approvals - is it only org members, or collaborators with write access too. Hopefully it is the latter - let's see.

Also, I am wondering if we should reduce the "Artifact and log retention" down to 30 days?

10 replies

CISC Mar 13, 2026
Collaborator

Nope. :(

ggerganov Mar 13, 2026
Maintainer Author

Most likely it will appear on new PRs - let's see

CISC Mar 13, 2026
Collaborator

Ah, yes, it did, at least on Contributor PRs.

ggerganov Mar 13, 2026
Maintainer Author

Ok, so I think this can work:

We now have a new team: @ggml-org/maintainers
Need to send invites to all existing collaborators with write access to join that team
This way, they become members of https://github.com/ggml-org and will be able to approve workflows

ggerganov Mar 13, 2026
Maintainer Author

Even though 30 days is probably fine, is there a downside to having it at 90 days?

Not sure. Will leave it at 90 days for now.

ServeurpersoCom · 2026-03-13T13:30:45Z

ServeurpersoCom
Mar 13, 2026
Collaborator

I can set up a dedicated Podman container with GPU access that starts automatically with my AI server (Ryzen 9 9950X3D 96GB DDR5). It won't interfere with my other workloads and can run CUDA and Vulkan workflows at full speed on a real GPU (RTX PRO 6000) ?

It would be a clean pod with minimal Debian Containerfile / yaml, with the latest CUDA/Vulkan, that anyone in our group could download and instantiate to run the pipeline.

2 replies

ggerganov Mar 14, 2026
Maintainer Author

Ideally, the runners should run only the workflows and nothing else to avoid interference. So it's better to enroll fully-dedicated machines as self-hosted runners.

ServeurpersoCom Mar 14, 2026
Collaborator

We could get a dedicated server with the smallest NVIDIA GPU from a professional hosting provider, the team would administer it via SSH. Something like https://www.hetzner.com/dedicated-rootserver/gex44/ (RTX 4000 Ada, 20GB, €184/mo)

taronaeo · 2026-03-14T13:26:56Z

taronaeo
Mar 14, 2026
Collaborator

I have a dedicated server with AMD Ryzen 7 2700X (8C 16T), 32 GB DDR4, 4 TB storage and an NVIDIA RTX 2060 GPU doing nothing currently.

I believe it can run both CUDA and Vulkan CI workloads. Let me know if my configuration is feasible and we can onboard my server as part of the self-hosted runners.

11 replies

netrunnereve Mar 18, 2026
Collaborator

Yeah I've been monitoring and agree; its under utilized right now. I was thinking maybe I could spin up 8 docker containers (or do it in Docker Swarm/k8s) where each container have 2 vCPUs (1 physical core). I don't know if that will be too slow for CI purposes.

I don't think parallel runners would work OK because there will be contention to resources and unpredictable loads.

The default runners (I think they're AMD 7763s) we currently use are all running as VMs on shared servers, which is why the runtimes are all over the place. If we aren't comparing performance between runs I don't see a problem with that though, and we can be smart about it and allocate enough resources for each VM to make overloading impossible. Like for example I think we can split the 16 thread and 32 GB computer here into 4 4 thread and 7 GB VMs and that should consistently run faster than the Github machines while letting us do 4 builds at a time.

taronaeo Mar 20, 2026
Collaborator

Here are some statistics from the last 24 hours. Yellow highlighted background indicates that a job is running.

Looking at the CPU utilisation, I suppose we can trial run adding another parallel runner specifically for CPU-only CIs since there are still pockets of the server being idle. We can also pin the physical CPU cores as such:

GPU runner: Cores 0-3
CPU runner: Cores 4-7

That way there shouldn't be much contention since it has full access to bare-metal hardware. Memory wise, we can allocate approximately 10 GiB to both runners since they utilise only that much memory.

The new parallel runner can also be the first candidate to take up the generic CPU workflows.

Let me know what you think, or if I should scrap the idea completely haha

ggerganov Mar 20, 2026
Maintainer Author

I think it's worth the experiment.

The new parallel runner can also be the first candidate to take up the generic CPU workflows.

Actually, I'd rather first move the CPU workflows to the existing runners as I consider them to be more long-term stable. Don't want to put too much pressure on the sg-hl1-ci-nvidia-vulkan-cm yet - first lets see if it is stable enough for a few days/weeks.

taronaeo Apr 5, 2026
Collaborator

Regarding

Move some of the ggml-ci--cpu- workflows to self-hosted runners to reduce some of the GH cache

and

Move the CPU server workflows to self-hosted runners as they are quite important and currently have a long queue (at least the linux-based ones)

I think I am ready to deploy 2x x86 self-hosted runners soon and 2-4x ARM64 (Ampere A1) runners at a later date as they are still being tested. Both architectures will deploy ephemeral runners managed by GitHub Self-Hosted ARC, which requires a GitHub App to be installed, similar to our IBM Power and Z Runners GitHub App.

But I am wondering how viable is it for us to actually move those CPU workflows to self-hosted though? From what I am reading, GitHub supplies at least 20 runners (docs) and if we were to move the workflows to self-hosted runners, we would bottleneck ourselves since we only have a handful of self-hosted runners available. Also, it appears that we cannot mix GitHub-hosted runners with self-hosted reduce the queue time.

Are we still considering the shift to self-hosted runners? By any chance am I misunderstanding something?

ggerganov Apr 6, 2026
Maintainer Author

But I am wondering how viable is it for us to actually move those CPU workflows to self-hosted though?

The idea was to offload some of the jobs to self-hosted runners in order to free space in Github cache. My understanding is that this will reduce the cache trashing and thus improve the overall speed of the CI.

On my end, I've de-prioritized this because I have to first learn how to sandbox my Macs and DGX Spark before adding them back as self-hosted runners.

Also, it appears that we cannot mix GitHub-hosted runners with self-hosted reduce the queue time.

How do we know that?

Are we still considering the shift to self-hosted runners? By any chance am I misunderstanding something?

I think we managed to significantly improve the state of the CI compared to when this discussion was opened. I guess the biggest impact was from switching to manual approval of the workflows.

I'm up to try using the 2x x86 runners that you've prepared and see how this would go. My understanding is that it should reduce the queue time - almost any self-hosted machine would be faster than the Github runners + it will have unlimited ccache.

CISC · 2026-03-15T13:20:21Z

CISC
Mar 15, 2026
Collaborator

* [x]  Update `ccache-action`: [ci : discuss optimization strategies #20446 (reply in thread)](https://github.com/ggml-org/llama.cpp/discussions/20446#discussioncomment-16097521)

What needs to be updated is this:
https://github.com/ggml-org/ccache-action/blob/main/action.yml#L45

Not sure if this will break anything though, this is not yet done upstream, so no use syncing our fork yet either.

0 replies

CISC · 2026-03-16T22:48:25Z

CISC
Mar 16, 2026
Collaborator

@ggerganov ccache just got updated, can you sync our fork?
https://github.com/hendrikmuhs/ccache-action/releases/tag/v1.2.21

2 replies

ggerganov Mar 17, 2026
Maintainer Author

Done

CISC Mar 17, 2026
Collaborator

Thanks, I'll update CIs.

IMbackK · 2026-03-18T19:49:31Z

IMbackK
Mar 18, 2026
Collaborator

I guess i am currently on my way of making this worse with #20430, i have quite a few more checks i would like to add to this workflow in the future. I would be happy to have this restricted to running when i press a button on prs that affect the hip backed.

1 reply

ggerganov Mar 19, 2026
Maintainer Author

I think it's OK to add. If we get overloaded at some point, we can consider only running it on master automatically and for the PRs to be manual.

When we provision some AMD hardware in the future, we will offload these workflows there.

arthw · 2026-04-23T05:45:17Z

arthw
Apr 23, 2026
Collaborator

The PR for SYCL backend: #20446 is merged. It separate the CI for SYCL backend from build.yml to build-sycl.yml and apply cache to skip download and install oneAPI package.

Now if the updated code is not SYCL backend (ggml/src/ggml-sycl/*), the CI won't call build-sycl.yml. It will reduce the work load.

build.yml is mandatory for all code changed. If we move the more backend build from it to build-xxx.yml, build.yml will become less.
It will reduce time for all backends CI.

0 replies

ggerganov · 2026-05-21T18:24:32Z

ggerganov
May 21, 2026
Maintainer Author

I feel the CI just got really slow again the last few days. (i.e. very long queue times)

5 replies

CISC May 21, 2026
Collaborator

Yes, though mainly due to the sheer amount of runs on PRs/merges.

It seems we're hitting some kind of quota and the queues stall for a long time when that happens.

ggerganov May 21, 2026
Maintainer Author

(not super related, but was just going through some of the repo metrics and saw this 👀 )

CISC May 21, 2026
Collaborator

😆 Directly correlates with the amount of agent spam I'm guessing...

ggerganov May 21, 2026
Maintainer Author

I think I was able to prototype a manual approval process (a.k.a "gatekeeper) using Github Environments. The idea is to require manual approval for most workflows not running on the master branch: ggml-org#33

Will see if this works as expected tomorrow.

Details

Edit: failed attempt: #23526

OPS-NeoRetro May 31, 2026

How did that git clone spike really happen?

mikhail-shevtsov-wiregate · 2026-05-23T21:49:47Z

mikhail-shevtsov-wiregate
May 23, 2026

@ggerganov Is there any specific reason that ccache is not enabled for Docker builds?

I did a local experiment with cuda.Dockerfile:

RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
    export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
    fi && \
    export CCACHE_DIR=/app/.ccache; \
    cmake -B build \
        -DCMAKE_C_COMPILER_LAUNCHER=ccache \
        -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
        -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache \
        -DGGML_NATIVE=OFF \
        -DGGML_CUDA=ON \
        -DGGML_BACKEND_DL=ON \
        -DGGML_CPU_ALL_VARIANTS=ON \
        -DLLAMA_BUILD_TESTS=OFF \
        ${CMAKE_ARGS} \
        -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
    cmake --build build --config Release -j$(nproc) && \
    ccache -s

and ccache extraction:

#!/bin/bash

CACHE="llama-cuda:build"

if [ "$(docker image inspect "$CACHE" 2>/dev/null)" != "[]" ]; then
    tmp_container=$(docker create "$CACHE")
    docker cp "${tmp_container}:/app/.ccache" ".ccache"
    docker rm "$tmp_container" >/dev/null
    echo "Restored ccache from $CACHE"
fi

docker build -f .devops/cuda.Dockerfile --target build -t llama-cuda:build . --progress=plain

And got pretty decent results:
first run:

#9 512.7 [100%] Built target llama-app
#9 512.7 Cacheable calls:    564 / 564 (100.0%)
#9 512.7   Hits:               0 / 564 ( 0.00%)
#9 512.7     Direct:           0
#9 512.7     Preprocessed:     0
#9 512.7   Misses:           564 / 564 (100.0%)
#9 512.7 Local storage:
#9 512.7   Cache size (GiB): 0.2 / 5.0 ( 3.47%)
#9 512.7   Hits:               0 / 564 ( 0.00%)
#9 512.7   Misses:           564 / 564 (100.0%)
#9 DONE 512.7s

and after git pull - second run:

#9 36.63 [100%] Built target llama-app
#9 36.64 Cacheable calls:    1129 / 1129 (100.0%)
#9 36.64   Hits:              496 / 1129 (43.93%)
#9 36.64     Direct:          486 /  496 (97.98%)
#9 36.64     Preprocessed:     10 /  496 ( 2.02%)
#9 36.64   Misses:            633 / 1129 (56.07%)
#9 36.64 Local storage:
#9 36.64   Cache size (GiB):  0.2 /  5.0 ( 3.61%)
#9 36.64   Hits:              496 / 1129 (43.93%)
#9 36.64   Misses:            633 / 1129 (56.07%)
#9 DONE 36.7s

Note: hardware - GB10 => Asus Ascend GX10

Cache will be stored in docker image itself inline cache.

If this something that is needed I can create a PR.

2 replies

CISC May 24, 2026
Collaborator

#21563

mikhail-shevtsov-wiregate May 27, 2026

@CISC Provided PR stores ccache outside of Docker which is sub-optimal. It is worth offloading any caching for Docker build directly to Docker Inline Cache - meaning bypassing limitation of GitHub 10GB per repo cache limit

ggerganov · 2026-05-25T14:44:13Z

ggerganov
May 25, 2026
Maintainer Author

Something that I realized today is that we actually have a limit of 20 hosted runners for the organization at a given time:

So we can have at most 20 jobs running at a given time on hosted runners.

6 replies

netrunnereve May 27, 2026
Collaborator

I think they just don't have enough machines available sometimes.

ggerganov May 27, 2026
Maintainer Author

I think they just don't have enough machines available sometimes.

That's what I thought as well, but during the past few days I've been monitoring closely how the CI works and I'm pretty sure the stalls are simply due to bad management on our end and nothing to do with Github runner availability (excluding the outage yesterday).

The first thing that I discovered is this organization dashboard:

https://github.com/organizations/ggml-org/settings/actions/hosted-runners

From here, I realized that we have at most 20 hosted runner at any given moment for all the repos in the organization. This was something I didn't know before and was thinking that our jobs are picked from some global pool of runners for the entire Github.

Monitoring this dashboard gives a very good understanding of what is happening and explains why we get the stalls. Merge a PR in whisper.cpp -> runners get allocated there -> llama.cpp jobs wait.

That's all fine, until the repository cache gets filled incorrectly. We have 10GB of cache per repository for ccache and other artifacts. If we mismanage the cache, what happens is that some jobs can go from 5 mins up to 3h! These are mostly windows/CUDA/ROCm builds that heavily rely on having a hot ccache. When the caches are trashed, a lot of the runners get stalled doing the rebuilds from start and this results in long queue times.

So the conclusion is that we have to be super strict what goes inside the cache. This is the most important factor for improving the responsiveness of the CI. We can monitor it here: https://github.com/ggml-org/llama.cpp/actions/caches. Every cache entry has to have a good reason to be there and we should be trying to reuse the ccaches across jobs that can share it.

Offloading jobs to self-hosted or 3rd-party-hosted runners also helps a lot because then the runners are using their own cache. I think the next big improvement for the CI will come when we provision windows machine(s) and also AMD machines for the ROCm builds.

In any case, I hope that after the last few days of CI optimizations, things will improve noticeably. We still need to cycle through some old PRs that do not have the workflow changes and will result in inefficient caches until merged/closed, but I expect in a few days things to normalize.

CISC May 27, 2026
Collaborator

Monitoring this dashboard gives a very good understanding of what is happening and explains why we get the stalls. Merge a PR in whisper.cpp -> runners get allocated there -> llama.cpp jobs wait.

This tiny detail explains a lot, did not think about that.

netrunnereve May 28, 2026
Collaborator

Okay, this makes sense!

As for the number of jobs in general I think we'll need to remove as much overlap as we can help it, like for example if we're already running the cpu high perf job there's no reason to run the low perf one as well. Of if we're doing a cuda build and test on the ci machines there's no reason to do that on a github machine at the same time.

ggerganov May 29, 2026
Maintainer Author

Yes, probably the cpu-any-low-perf job can just be removed.

ggerganov · 2026-05-26T11:13:27Z

ggerganov
May 26, 2026
Maintainer Author

Hm, either Github Actions is having issues, or the ggml-org just got suspended.

6 replies

ggerganov May 26, 2026
Maintainer Author

Yes, it's a global outage.

CISC May 26, 2026
Collaborator

We finally took down all of GHA? :)

ggerganov May 26, 2026
Maintainer Author

Yeah, those extra workflows finally pushed it over the edge :D

mikhail-shevtsov-wiregate May 26, 2026

You always have a plan B of Self-Hosted GitLab instance for CI/CD

ServeurpersoCom May 27, 2026
Collaborator

😅🥵

OPS-NeoRetro · 2026-05-31T18:23:32Z

OPS-NeoRetro
May 31, 2026

Optimize ubuntu-24-cmake-vulkan - currently very slow

These lines from the log showed that the slowness of that entire job was caused by the Vulkan renderer or device set to the software renderer (llvmpipe), which means that the runner that the job ran on had no GPUs or no GPU drivers

So, ubuntu-24-cmake-vulkan must be moved to one of the GPU runners

1 reply

netrunnereve Jun 1, 2026
Collaborator

Yeah that job runs on CPU, we've turned off test backend ops so it's much quicker now. We run the full ci on GPU.

JohannesGaessler · 2026-06-01T10:41:34Z

JohannesGaessler
Jun 1, 2026
Collaborator

@ggerganov did you consider switching to nightly builds for releases? We currently have dozens of commits to master each day which means a lot of release builds. But we could feasibly trigger a single release build every 24 hours instead; for the vast majority of users that should be sufficient. The only concern I have would be trying to nail down the introduction of a bug to a single commit. But my experience has been that it is quite difficult to get this information from the kind of user that does not know how to compile the project in the first place.

To make the whole thing a bit fancier we could use a language model to summarize and link PRs merged into master for each release. To my understanding @am17an has already built something similar for his personal use.

1 reply

ggerganov Jun 1, 2026
Maintainer Author

I was thinking about this, but after optimizing the release process, I think it is fine to remain per commit. Before the recent changes, we were trashing the cache constantly and running multiple release workflows at the same time. Now the releases are sequential and properly utilize the cache, so it's much smoother.

The only concern I have would be trying to nail down the introduction of a bug to a single commit.

Yeah, this is useful - some users know how to bisect only via releases (i.e. don't know how / can't build from source).

ci : discuss optimization strategies #20446

Uh oh!

Uh oh!

ggerganov Mar 12, 2026 Maintainer

Overview

TODOs

Upcoming self-hosted runners

Replies: 17 comments · 57 replies

Uh oh!

Uh oh!

CISC Mar 12, 2026 Collaborator

Uh oh!

ggerganov Mar 12, 2026 Maintainer Author

Uh oh!

ggerganov Mar 12, 2026 Maintainer Author

Uh oh!

danbev Mar 12, 2026 Maintainer

Uh oh!

ggerganov Mar 12, 2026 Maintainer Author

Uh oh!

aldehir Mar 12, 2026 Collaborator

Uh oh!

JohannesGaessler Mar 13, 2026 Collaborator

Uh oh!

CISC Mar 13, 2026 Collaborator

Uh oh!

CISC Mar 12, 2026 Collaborator

Uh oh!

ggerganov Mar 12, 2026 Maintainer Author

Uh oh!

CISC Mar 12, 2026 Collaborator

Uh oh!

Uh oh!

netrunnereve Mar 12, 2026 Collaborator

Uh oh!

CISC Mar 12, 2026 Collaborator

Uh oh!

ggerganov Mar 13, 2026 Maintainer Author

Uh oh!

CISC Mar 13, 2026 Collaborator

Uh oh!

ggerganov Mar 13, 2026 Maintainer Author

Uh oh!

CISC Mar 13, 2026 Collaborator

Uh oh!

ggerganov Mar 13, 2026 Maintainer Author

Uh oh!

ggerganov Mar 13, 2026 Maintainer Author

Uh oh!

Uh oh!

ServeurpersoCom Mar 13, 2026 Collaborator

Uh oh!

ggerganov Mar 14, 2026 Maintainer Author

Uh oh!

ServeurpersoCom Mar 14, 2026 Collaborator

Uh oh!

taronaeo Mar 14, 2026 Collaborator

Uh oh!

netrunnereve Mar 18, 2026 Collaborator

Uh oh!

taronaeo Mar 20, 2026 Collaborator

Uh oh!

ggerganov Mar 20, 2026 Maintainer Author

Uh oh!

taronaeo Apr 5, 2026 Collaborator

Uh oh!

ggerganov Apr 6, 2026 Maintainer Author

Uh oh!

CISC Mar 15, 2026 Collaborator

Uh oh!

CISC Mar 16, 2026 Collaborator

ggerganov
Mar 12, 2026
Maintainer

Replies: 17 comments 57 replies

CISC
Mar 12, 2026
Collaborator

ggerganov Mar 12, 2026
Maintainer Author

ggerganov
Mar 12, 2026
Maintainer Author

danbev Mar 12, 2026
Maintainer

ggerganov Mar 12, 2026
Maintainer Author

aldehir Mar 12, 2026
Collaborator

JohannesGaessler Mar 13, 2026
Collaborator

CISC Mar 13, 2026
Collaborator

CISC
Mar 12, 2026
Collaborator

ggerganov Mar 12, 2026
Maintainer Author

CISC Mar 12, 2026
Collaborator

netrunnereve
Mar 12, 2026
Collaborator

CISC Mar 12, 2026
Collaborator

ggerganov
Mar 13, 2026
Maintainer Author

CISC Mar 13, 2026
Collaborator

ggerganov Mar 13, 2026
Maintainer Author

CISC Mar 13, 2026
Collaborator

ggerganov Mar 13, 2026
Maintainer Author

ggerganov Mar 13, 2026
Maintainer Author

ServeurpersoCom
Mar 13, 2026
Collaborator

ggerganov Mar 14, 2026
Maintainer Author

ServeurpersoCom Mar 14, 2026
Collaborator

taronaeo
Mar 14, 2026
Collaborator

netrunnereve Mar 18, 2026
Collaborator

taronaeo Mar 20, 2026
Collaborator

ggerganov Mar 20, 2026
Maintainer Author

taronaeo Apr 5, 2026
Collaborator

ggerganov Apr 6, 2026
Maintainer Author

CISC
Mar 15, 2026
Collaborator

CISC
Mar 16, 2026
Collaborator