Skip to content

Commit 6c9d341

Browse files
authored
Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888)
* Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs
1 parent dc04618 commit 6c9d341

28 files changed

Lines changed: 196 additions & 597 deletions

docs/skills/llamafile/testing.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,7 @@ Tests in llamafile use the `.runs` suffix convention:
8282
# In tests/BUILD.mk
8383
.PHONY: o/$(MODE)/tests
8484
o/$(MODE)/tests: \
85-
o/$(MODE)/tests/extract_data_uris_test.runs \
86-
o/$(MODE)/tests/minja/minja_integration_test.runs
85+
o/$(MODE)/tests/extract_data_uris_test.runs
8786
```
8887

8988
The `.runs` file is a timestamp marker indicating the test passed. The build system:
@@ -125,8 +124,6 @@ Currently in the `new_build_wip` branch, these tests are saved in:
125124

126125
```
127126
tests/
128-
└── minja
129-
└── *_test.c # Jinja template parsing tests
130127
└── sgemm
131128
└── *_test.c # Optimized CPU kernels tests
132129
...
@@ -188,10 +185,10 @@ the `tests/BUILD.mk` file, thus they need to be manually compiled and run.
188185
189186
```sh
190187
# Build the test
191-
.cosmocc/4.0.2/bin/make o//tests/minja/minja_integration_test
188+
.cosmocc/4.0.2/bin/make o//tests/extract_data_uris_test
192189
193190
# Run directly
194-
.o/tests/minja/minja_integration_test
191+
./o/tests/extract_data_uris_test
195192
```
196193

197194
### Debug Build

llama.cpp

llama.cpp.patches/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ llama.cpp.patches/
1111
├── renames.sh # Script for file renames/moves (if any)
1212
├── llamafile-files/ # Additional files to copy into llama.cpp
1313
│ ├── BUILD.mk # Makefile for building llama.cpp with cosmocc
14-
│ └── README.llamafile # License and modification notes
14+
│ ├── README.llamafile # License and modification notes
15+
│ └── common/
16+
│ └── license.cpp # Llama.cpp's license file (cmake creates this at build time)
1517
└── patches/ # Patch files for upstream sources
1618
```
1719

@@ -40,6 +42,7 @@ These patches address compatibility issues when building with Cosmopolitan libc
4042
| `common_arg.cpp.patch` | Adds `COSMOCC` platform detection for `PATH_MAX` (includes `linux/limits.h`) |
4143
| `common_common.cpp.patch` | Adds platform-aware cache directory detection for Cosmopolitan (checks `LOCALAPPDATA`, `XDG_CACHE_HOME`, falls back to `~/.cache/`) |
4244
| `common_download.cpp.patch` | Adds `COSMOCC` platform detection for `PATH_MAX` |
45+
| `common_ngram-mod.cpp.patch` | Adds missing `#include <algorithm>` for `std::fill` |
4346

4447
### Threading and Signal Handling
4548

@@ -49,7 +52,7 @@ Cosmopolitan libc has specific behaviors with condition variables and signals th
4952
|-------|-------------|
5053
| `common_log.cpp.patch` | Blocks `SIGINT`/`SIGTERM` on logger thread to prevent `EINTR` exceptions; uses `wait_for()` instead of `wait()` to work around XNU futex timeout bug (~72 minute expiry) |
5154
| `tools_server_server-queue.cpp.patch` | Same threading fixes for server queue: signal masking and `wait_for()` timeouts |
52-
| `vendor_cpp-httplib_httplib.h.patch` | Fixes httplib thread pool with `wait_for()` instead of `wait()` for XNU futex compatibility |
55+
| `vendor_cpp-httplib_httplib.cpp.patch` | Fixes httplib thread pool with `wait_for()` instead of `wait()` for XNU futex compatibility |
5356

5457
### Cross-Module Memory Management
5558

@@ -93,7 +96,6 @@ These patches integrate llamafile's file handling APIs for loading models from b
9396
| Patch | Description |
9497
|-------|-------------|
9598
| `vendor_miniaudio_miniaudio.h.patch` | Removes `__COSMOPOLITAN__` from Windows platform detection (Cosmopolitan handles this at runtime) |
96-
| `vendor_minja_minja.hpp.patch` | Replaces regex-based Jinja comment parsing with manual parsing to prevent stack overflow on large templates |
9799

98100
### Miscellaneous
99101

llama.cpp.patches/llamafile-files/BUILD.mk

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ GGML_SRCS_C := \
2323
llama.cpp/ggml/src/ggml-cpu/quants.c
2424

2525
GGML_SRCS_CPP := \
26+
llama.cpp/ggml/src/ggml-backend-dl.cpp \
2627
llama.cpp/ggml/src/ggml-backend-reg.cpp \
2728
llama.cpp/ggml/src/ggml-backend.cpp \
2829
llama.cpp/ggml/src/ggml-opt.cpp \
@@ -71,12 +72,14 @@ LLAMA_SRCS_CPP := \
7172
llama.cpp/src/models/deci.cpp \
7273
llama.cpp/src/models/deepseek.cpp \
7374
llama.cpp/src/models/deepseek2.cpp \
75+
llama.cpp/src/models/delta-net-base.cpp \
7476
llama.cpp/src/models/dots1.cpp \
7577
llama.cpp/src/models/dream.cpp \
7678
llama.cpp/src/models/ernie4-5-moe.cpp \
7779
llama.cpp/src/models/ernie4-5.cpp \
7880
llama.cpp/src/models/exaone.cpp \
7981
llama.cpp/src/models/exaone4.cpp \
82+
llama.cpp/src/models/exaone-moe.cpp \
8083
llama.cpp/src/models/falcon-h1.cpp \
8184
llama.cpp/src/models/falcon.cpp \
8285
llama.cpp/src/models/gemma-embedding.cpp \
@@ -90,14 +93,16 @@ LLAMA_SRCS_CPP := \
9093
llama.cpp/src/models/gptneox.cpp \
9194
llama.cpp/src/models/granite-hybrid.cpp \
9295
llama.cpp/src/models/granite.cpp \
93-
llama.cpp/src/models/graph-context-mamba.cpp \
96+
llama.cpp/src/models/mamba-base.cpp \
9497
llama.cpp/src/models/grok.cpp \
9598
llama.cpp/src/models/grovemoe.cpp \
9699
llama.cpp/src/models/hunyuan-dense.cpp \
97100
llama.cpp/src/models/hunyuan-moe.cpp \
98101
llama.cpp/src/models/internlm2.cpp \
99102
llama.cpp/src/models/jais.cpp \
103+
llama.cpp/src/models/jais2.cpp \
100104
llama.cpp/src/models/jamba.cpp \
105+
llama.cpp/src/models/kimi-linear.cpp \
101106
llama.cpp/src/models/lfm2.cpp \
102107
llama.cpp/src/models/llada-moe.cpp \
103108
llama.cpp/src/models/llada.cpp \
@@ -120,6 +125,7 @@ LLAMA_SRCS_CPP := \
120125
llama.cpp/src/models/openai-moe-iswa.cpp \
121126
llama.cpp/src/models/openelm.cpp \
122127
llama.cpp/src/models/orion.cpp \
128+
llama.cpp/src/models/paddleocr.cpp \
123129
llama.cpp/src/models/pangu-embedded.cpp \
124130
llama.cpp/src/models/phi2.cpp \
125131
llama.cpp/src/models/phi3.cpp \
@@ -134,6 +140,8 @@ LLAMA_SRCS_CPP := \
134140
llama.cpp/src/models/qwen3.cpp \
135141
llama.cpp/src/models/qwen3moe.cpp \
136142
llama.cpp/src/models/qwen3next.cpp \
143+
llama.cpp/src/models/qwen35.cpp \
144+
llama.cpp/src/models/qwen35moe.cpp \
137145
llama.cpp/src/models/qwen3vl-moe.cpp \
138146
llama.cpp/src/models/qwen3vl.cpp \
139147
llama.cpp/src/models/refact.cpp \
@@ -148,6 +156,7 @@ LLAMA_SRCS_CPP := \
148156
llama.cpp/src/models/smollm3.cpp \
149157
llama.cpp/src/models/stablelm.cpp \
150158
llama.cpp/src/models/starcoder.cpp \
159+
llama.cpp/src/models/step35-iswa.cpp \
151160
llama.cpp/src/models/starcoder2.cpp \
152161
llama.cpp/src/models/t5-dec.cpp \
153162
llama.cpp/src/models/t5-enc.cpp \
@@ -167,14 +176,15 @@ LLAMA_SRCS_CPP := \
167176
llama.cpp/src/llama-kv-cache-iswa.cpp \
168177
llama.cpp/src/llama-kv-cache.cpp \
169178
llama.cpp/src/llama-memory-hybrid.cpp \
179+
llama.cpp/src/llama-memory-hybrid-iswa.cpp \
170180
llama.cpp/src/llama-memory-recurrent.cpp \
171181
llama.cpp/src/llama-memory.cpp \
172182
llama.cpp/src/llama-mmap.cpp \
173183
llama.cpp/src/llama-model-loader.cpp \
174184
llama.cpp/src/llama-model-saver.cpp \
175185
llama.cpp/src/llama-model.cpp \
176186
llama.cpp/src/llama-quant.cpp \
177-
llama.cpp/src/llama-sampling.cpp \
187+
llama.cpp/src/llama-sampler.cpp \
178188
llama.cpp/src/llama-vocab.cpp \
179189
llama.cpp/src/unicode-data.cpp \
180190
llama.cpp/src/unicode.cpp
@@ -193,12 +203,22 @@ COMMON_SRCS_CPP := \
193203
llama.cpp/common/chat.cpp \
194204
llama.cpp/common/common.cpp \
195205
llama.cpp/common/console.cpp \
206+
llama.cpp/common/debug.cpp \
196207
llama.cpp/common/download.cpp \
208+
llama.cpp/common/jinja/caps.cpp \
209+
llama.cpp/common/jinja/lexer.cpp \
210+
llama.cpp/common/jinja/parser.cpp \
211+
llama.cpp/common/jinja/runtime.cpp \
212+
llama.cpp/common/jinja/string.cpp \
213+
llama.cpp/common/jinja/value.cpp \
197214
llama.cpp/common/json-partial.cpp \
198215
llama.cpp/common/json-schema-to-grammar.cpp \
216+
llama.cpp/common/license.cpp \
199217
llama.cpp/common/llguidance.cpp \
200218
llama.cpp/common/log.cpp \
201219
llama.cpp/common/ngram-cache.cpp \
220+
llama.cpp/common/ngram-map.cpp \
221+
llama.cpp/common/ngram-mod.cpp \
202222
llama.cpp/common/peg-parser.cpp \
203223
llama.cpp/common/preset.cpp \
204224
llama.cpp/common/regex-partial.cpp \
@@ -256,10 +276,14 @@ MTMD_SRCS_CPP := \
256276
llama.cpp/tools/mtmd/models/conformer.cpp \
257277
llama.cpp/tools/mtmd/models/glm4v.cpp \
258278
llama.cpp/tools/mtmd/models/internvl.cpp \
279+
llama.cpp/tools/mtmd/models/kimik25.cpp \
259280
llama.cpp/tools/mtmd/models/kimivl.cpp \
260281
llama.cpp/tools/mtmd/models/llama4.cpp \
261282
llama.cpp/tools/mtmd/models/llava.cpp \
262283
llama.cpp/tools/mtmd/models/minicpmv.cpp \
284+
llama.cpp/tools/mtmd/models/mobilenetv5.cpp \
285+
llama.cpp/tools/mtmd/models/nemotron-v2-vl.cpp \
286+
llama.cpp/tools/mtmd/models/paddleocr.cpp \
263287
llama.cpp/tools/mtmd/models/pixtral.cpp \
264288
llama.cpp/tools/mtmd/models/qwen2vl.cpp \
265289
llama.cpp/tools/mtmd/models/qwen3vl.cpp \
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
// Generated by CMake
2+
3+
const char* LICENSES[] = {
4+
R"=L=(License for llama.cpp
5+
=====================
6+
7+
MIT License
8+
9+
Copyright (c) 2023-2026 The ggml authors
10+
11+
Permission is hereby granted, free of charge, to any person obtaining a copy
12+
of this software and associated documentation files (the "Software"), to deal
13+
in the Software without restriction, including without limitation the rights
14+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15+
copies of the Software, and to permit persons to whom the Software is
16+
furnished to do so, subject to the following conditions:
17+
18+
The above copyright notice and this permission notice shall be included in all
19+
copies or substantial portions of the Software.
20+
21+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27+
SOFTWARE.
28+
)=L=",
29+
R"=L=(License for cpp-httplib
30+
=======================
31+
32+
The MIT License (MIT)
33+
34+
Copyright (c) 2017 yhirose
35+
36+
Permission is hereby granted, free of charge, to any person obtaining a copy
37+
of this software and associated documentation files (the "Software"), to deal
38+
in the Software without restriction, including without limitation the rights
39+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
40+
copies of the Software, and to permit persons to whom the Software is
41+
furnished to do so, subject to the following conditions:
42+
43+
The above copyright notice and this permission notice shall be included in all
44+
copies or substantial portions of the Software.
45+
46+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
47+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
48+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
49+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
50+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
51+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
52+
SOFTWARE.
53+
54+
)=L=",
55+
R"=L=(License for jsonhpp
56+
===================
57+
58+
MIT License
59+
60+
Copyright (c) 2013-2025 Niels Lohmann
61+
62+
Permission is hereby granted, free of charge, to any person obtaining a copy
63+
of this software and associated documentation files (the "Software"), to deal
64+
in the Software without restriction, including without limitation the rights
65+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
66+
copies of the Software, and to permit persons to whom the Software is
67+
furnished to do so, subject to the following conditions:
68+
69+
The above copyright notice and this permission notice shall be included in all
70+
copies or substantial portions of the Software.
71+
72+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
73+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
74+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
75+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
76+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
77+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
78+
SOFTWARE.
79+
)=L=",
80+
nullptr
81+
};

llama.cpp.patches/patches/common_arg.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/arg.cpp b/common/arg.cpp
22
--- a/llama.cpp/common/arg.cpp
33
+++ b/llama.cpp/common/arg.cpp
4-
@@ -34,6 +34,8 @@
4+
@@ -36,6 +36,8 @@
55
#ifndef __EMSCRIPTEN__
66
#ifdef __linux__
77
#include <linux/limits.h>

llama.cpp.patches/patches/common_chat.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/chat.cpp b/common/chat.cpp
22
--- a/llama.cpp/common/chat.cpp
33
+++ b/llama.cpp/common/chat.cpp
4-
@@ -1698,7 +1698,7 @@ static common_chat_params common_chat_params_init_deepseek_v3_1(const common_cha
4+
@@ -1791,7 +1791,7 @@ static common_chat_params common_chat_params_init_deepseek_v3_1(const common_cha
55
};
66

77
auto prompt = apply(tmpl, inputs,

llama.cpp.patches/patches/common_common.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/common.cpp b/common/common.cpp
22
--- a/llama.cpp/common/common.cpp
33
+++ b/llama.cpp/common/common.cpp
4-
@@ -920,6 +920,16 @@ std::string fs_get_cache_directory() {
4+
@@ -874,6 +874,16 @@ std::string fs_get_cache_directory() {
55
cache_directory = std::getenv("HOME") + std::string("/Library/Caches/");
66
#elif defined(_WIN32)
77
cache_directory = std::getenv("LOCALAPPDATA");

llama.cpp.patches/patches/common_download.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/download.cpp b/common/download.cpp
22
--- a/llama.cpp/common/download.cpp
33
+++ b/llama.cpp/common/download.cpp
4-
@@ -29,6 +29,8 @@
4+
@@ -24,6 +24,8 @@
55
#ifndef __EMSCRIPTEN__
66
#ifdef __linux__
77
#include <linux/limits.h>
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
diff --git a/common/ngram-mod.cpp b/common/ngram-mod.cpp
2+
--- a/llama.cpp/common/ngram-mod.cpp
3+
+++ b/llama.cpp/common/ngram-mod.cpp
4+
@@ -1,5 +1,7 @@
5+
#include "ngram-mod.h"
6+
7+
+#include <algorithm>
8+
+
9+
//
10+
// common_ngram_mod
11+
//

0 commit comments

Comments
 (0)