Skip to content

Commit ea67fea

Browse files
aittalamclaude
andauthored
Update llama.cpp submodule to 5e9c63546 (#941)
* Update llama.cpp submodule to 5e9c63546 * Update llama.cpp patches for 5e9c63546 - Remove obsolete common_chat.cpp.patch (deepseek v3.1 function was deleted upstream in chat template refactoring) - Regenerate all patches to match new upstream line numbers - Fix gguf.cpp patch for gguf_init_from_file_impl -> gguf_init_from_file_ptr rename Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove common_chat.cpp.patch from patch README The patch was deleted as the upstream function it targeted (common_chat_params_init_deepseek_v3_1) was removed in a refactoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update BUILD.mk and fix API breakage for llama.cpp 5e9c63546 BUILD.mk changes (llama.cpp and llamafile): - Add new common/ sources: chat-auto-parser-generator, chat-auto-parser-helpers, chat-diff-analyzer, hf-cache, reasoning-budget - Remove deleted common/ sources: chat-parser-xml-toolcall, chat-parser - Add src/models/gemma4-iswa.cpp - Add tools/server/server-tools.cpp (both BUILD.mk files) - Add new mtmd models: deepseekocr, gemma4v, hunyuanocr, step3vl, mtmd-image API fix: - Replace thinking_forced_open (removed upstream in chat template refactoring) with generation_prompt in chatbot_cli.cpp and chatbot_main.cpp Test fix: - Add jinja library objects to extract_data_uris_test deps (jinja types now have separate compilation units) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Adding Q1 to tinyblas * Fix TinyBLAS Q1_0 dispatch for upstream block format (QK1_0=128) Upstream llama.cpp's Q1_0 uses 128-element blocks (QK1_0=128), which matches what was previously called Q1_0_g128 in the add-prismml branch. The TinyBLAS code was using the 32-element Q0 handlers for Q1_0, causing it to fall back to the slow generic ggml path. - Route Q1_0 sgemm/mixmul to the g128 handlers (128-element blocks) - Remove Q1_0_g128 sgemm case (type doesn't exist in upstream) - Replace block_q1_0_g128 references with block_q1_0 in tinyblas_cpu.h Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix server web UI: update assets for upstream's new bundled format Upstream llama.cpp changed from a single gzipped index.html to separate index.html + bundle.js + bundle.css files, gated behind LLAMA_BUILD_WEBUI. - Update SERVER_ASSETS to generate all 4 .hpp files - Add -DLLAMA_BUILD_WEBUI to server compilation flags - Remove reference to index.html.gz (no longer exists) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add server-models.cpp patch for XNU futex timeout/EINTR crash New upstream file tools/server/server-models.cpp has 3 unprotected cv.wait() calls that crash on macOS with Cosmopolitan libc (ETIMEDOUT after ~72 min idle, or EINTR from signal interruption). Fix follows the same pattern as existing server-queue.cpp and log.cpp patches: replace cv.wait() with wait_for(30s) loops, and block SIGINT/SIGTERM on the stopping_thread via pthread_sigmask. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Block SIGINT/SIGTERM on httplib thread pool workers The previous httplib patch only addressed ETIMEDOUT (converting wait() to wait_for() loops). But wait_for() also throws on EINTR when a signal interrupts the futex syscall. Add pthread_sigmask to block signals on pool worker threads, matching the pattern used in server-queue.cpp and log.cpp. This fixes the "condition_variable timed_wait failed: Interrupted system call" crash that occurs when the server is idle and receives a signal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Honor --reasoning flag in chat mode * Disable thinking mode except when thinking tests are running --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b355cee commit ea67fea

30 files changed

Lines changed: 794 additions & 305 deletions

llama.cpp

llama.cpp.patches/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ Cosmopolitan libc has specific behaviors with condition variables and signals th
8383
| Patch | Description |
8484
|-------|-------------|
8585
| `common_log.cpp.patch` | Adds `#include <csignal>`; blocks `SIGINT`/`SIGTERM` on logger thread via `pthread_sigmask` to prevent `EINTR` exceptions; replaces `cv.wait()` with `wait_for(30s)` loop to work around XNU futex timeout bug (~72 minute expiry) |
86+
| `tools_server_server-models.cpp.patch` | Adds `#include <csignal>`; blocks `SIGINT`/`SIGTERM` on stopping thread; replaces `cv.wait()` with `wait_for(30s)` loops in `unload_lru`, `stopping_thread`, and `wait_until_loading_finished` |
8687
| `tools_server_server-queue.cpp.patch` | Adds missing includes (`<cerrno>`, `<system_error>`, `<csignal>`); blocks `SIGINT`/`SIGTERM` on queue thread; replaces `wait()` with `wait_for()` loops in three locations (`wait_until_no_sleep`, main loop, `recv`) |
8788
| `vendor_cpp-httplib_httplib.cpp.patch` | Fixes httplib thread pool with `wait_for()` instead of `wait()` for XNU futex compatibility |
8889

@@ -116,7 +117,6 @@ These patches integrate llamafile's file handling APIs for loading models from b
116117

117118
| Patch | Description |
118119
|-------|-------------|
119-
| `common_chat.cpp.patch` | Fixes C++ type conversion: explicitly wraps `inputs.messages` in `std::optional<json>()` for Deepseek v3.1 template |
120120
| `ggml_src_ggml-backend-reg.cpp.patch` | Suppresses debug log noise for non-existent backend search paths (irrelevant for llamafile's DSO loading approach) |
121121
| `ggml_src_ggml-vulkan_ggml-vulkan.cpp.patch` | Fixes unsigned integer underflow in `ggml_backend_vk_get_device_memory` where Vulkan's `heapUsage` can exceed `heapBudget` (clamps to zero instead of wrapping) |
122122

llama.cpp.patches/llamafile-files/BUILD.mk

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ LLAMA_SRCS_CPP := \
8888
llama.cpp/src/models/gemma2-iswa.cpp \
8989
llama.cpp/src/models/gemma3.cpp \
9090
llama.cpp/src/models/gemma3n-iswa.cpp \
91+
llama.cpp/src/models/gemma4-iswa.cpp \
9192
llama.cpp/src/models/glm4-moe.cpp \
9293
llama.cpp/src/models/glm4.cpp \
9394
llama.cpp/src/models/gpt2.cpp \
@@ -198,14 +199,16 @@ LLAMA_OBJS := $(LLAMA_SRCS_CPP:%.cpp=o/$(MODE)/%.cpp.o)
198199

199200
COMMON_SRCS_CPP := \
200201
llama.cpp/common/arg.cpp \
201-
llama.cpp/common/chat-parser-xml-toolcall.cpp \
202-
llama.cpp/common/chat-parser.cpp \
202+
llama.cpp/common/chat-auto-parser-generator.cpp \
203+
llama.cpp/common/chat-auto-parser-helpers.cpp \
204+
llama.cpp/common/chat-diff-analyzer.cpp \
203205
llama.cpp/common/chat-peg-parser.cpp \
204206
llama.cpp/common/chat.cpp \
205207
llama.cpp/common/common.cpp \
206208
llama.cpp/common/console.cpp \
207209
llama.cpp/common/debug.cpp \
208210
llama.cpp/common/download.cpp \
211+
llama.cpp/common/hf-cache.cpp \
209212
llama.cpp/common/jinja/caps.cpp \
210213
llama.cpp/common/jinja/lexer.cpp \
211214
llama.cpp/common/jinja/parser.cpp \
@@ -222,6 +225,7 @@ COMMON_SRCS_CPP := \
222225
llama.cpp/common/ngram-mod.cpp \
223226
llama.cpp/common/peg-parser.cpp \
224227
llama.cpp/common/preset.cpp \
228+
llama.cpp/common/reasoning-budget.cpp \
225229
llama.cpp/common/regex-partial.cpp \
226230
llama.cpp/common/sampling.cpp \
227231
llama.cpp/common/speculative.cpp \
@@ -273,9 +277,13 @@ MTMD_SRCS_CPP := \
273277
llama.cpp/tools/mtmd/mtmd.cpp \
274278
llama.cpp/tools/mtmd/mtmd-helper.cpp \
275279
llama.cpp/tools/mtmd/mtmd-audio.cpp \
280+
llama.cpp/tools/mtmd/mtmd-image.cpp \
276281
llama.cpp/tools/mtmd/models/cogvlm.cpp \
282+
llama.cpp/tools/mtmd/models/deepseekocr.cpp \
277283
llama.cpp/tools/mtmd/models/conformer.cpp \
284+
llama.cpp/tools/mtmd/models/gemma4v.cpp \
278285
llama.cpp/tools/mtmd/models/glm4v.cpp \
286+
llama.cpp/tools/mtmd/models/hunyuanocr.cpp \
279287
llama.cpp/tools/mtmd/models/internvl.cpp \
280288
llama.cpp/tools/mtmd/models/kimik25.cpp \
281289
llama.cpp/tools/mtmd/models/kimivl.cpp \
@@ -289,6 +297,7 @@ MTMD_SRCS_CPP := \
289297
llama.cpp/tools/mtmd/models/qwen2vl.cpp \
290298
llama.cpp/tools/mtmd/models/qwen3vl.cpp \
291299
llama.cpp/tools/mtmd/models/siglip.cpp \
300+
llama.cpp/tools/mtmd/models/step3vl.cpp \
292301
llama.cpp/tools/mtmd/models/whisper-enc.cpp \
293302
llama.cpp/tools/mtmd/models/youtuvl.cpp
294303

@@ -316,7 +325,9 @@ o/$(MODE)/llama.cpp/tools/server/%.hpp: llama.cpp/tools/server/public/%
316325
@echo 'unsigned int $(VARNAME)_len = sizeof($(VARNAME));' >> $@
317326

318327
SERVER_ASSETS := \
319-
o/$(MODE)/llama.cpp/tools/server/index.html.gz.hpp \
328+
o/$(MODE)/llama.cpp/tools/server/index.html.hpp \
329+
o/$(MODE)/llama.cpp/tools/server/bundle.js.hpp \
330+
o/$(MODE)/llama.cpp/tools/server/bundle.css.hpp \
320331
o/$(MODE)/llama.cpp/tools/server/loading.html.hpp
321332

322333
# ==============================================================================
@@ -336,7 +347,8 @@ TOOL_SERVER_SRCS := \
336347
llama.cpp/tools/server/server-http.cpp \
337348
llama.cpp/tools/server/server-models.cpp \
338349
llama.cpp/tools/server/server-queue.cpp \
339-
llama.cpp/tools/server/server-task.cpp
350+
llama.cpp/tools/server/server-task.cpp \
351+
llama.cpp/tools/server/server-tools.cpp
340352

341353
# Tool object files
342354
TOOL_QUANTIZE_OBJS := $(TOOL_QUANTIZE_SRCS:%.cpp=o/$(MODE)/%.cpp.o)
@@ -373,8 +385,9 @@ $(TOOL_PERPLEXITY_OBJS) $(TOOL_BENCH_OBJS) $(TOOL_SERVER_OBJS) $(MTMD_OBJS): \
373385
-iquote o/$(MODE)/llama.cpp/tools/server \
374386
-isystem llama.cpp/vendor
375387

376-
# Server needs llamafile headers for Metal support
388+
# Server needs llamafile headers for Metal support and web UI
377389
$(TOOL_SERVER_OBJS): private CPPFLAGS += -iquote llamafile
390+
$(TOOL_SERVER_OBJS): private CCFLAGS += -DLLAMA_BUILD_WEBUI
378391

379392
# Version definitions
380393
$(GGML_OBJS): private CCFLAGS += \

llama.cpp.patches/patches/common_arg.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/arg.cpp b/common/arg.cpp
22
--- a/llama.cpp/common/arg.cpp
33
+++ b/llama.cpp/common/arg.cpp
4-
@@ -36,6 +36,8 @@
4+
@@ -37,6 +37,8 @@
55
#ifndef __EMSCRIPTEN__
66
#ifdef __linux__
77
#include <linux/limits.h>

llama.cpp.patches/patches/common_chat.cpp.patch

Lines changed: 0 additions & 12 deletions
This file was deleted.

llama.cpp.patches/patches/common_common.cpp.patch

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/common.cpp b/common/common.cpp
22
--- a/llama.cpp/common/common.cpp
33
+++ b/llama.cpp/common/common.cpp
4-
@@ -874,6 +874,16 @@ std::string fs_get_cache_directory() {
4+
@@ -970,6 +970,16 @@ std::string fs_get_cache_directory() {
55
cache_directory = std::getenv("HOME") + std::string("/Library/Caches/");
66
#elif defined(_WIN32)
77
cache_directory = std::getenv("LOCALAPPDATA");
@@ -18,7 +18,7 @@ diff --git a/common/common.cpp b/common/common.cpp
1818
#elif defined(__EMSCRIPTEN__)
1919
GGML_ABORT("not implemented on this platform");
2020
#else
21-
@@ -1050,10 +1060,31 @@ common_init_result::common_init_result(common_params & params) :
21+
@@ -1146,10 +1156,31 @@ common_init_result::common_init_result(common_params & params) :
2222

2323
if (params.fit_params) {
2424
LOG_INF("%s: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on\n", __func__);

llama.cpp.patches/patches/common_download.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/common/download.cpp b/common/download.cpp
22
--- a/llama.cpp/common/download.cpp
33
+++ b/llama.cpp/common/download.cpp
4-
@@ -24,6 +24,8 @@
4+
@@ -25,6 +25,8 @@
55
#ifndef __EMSCRIPTEN__
66
#ifdef __linux__
77
#include <linux/limits.h>

llama.cpp.patches/patches/ggml_src_ggml-backend-reg.cpp.patch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/ggml/src/ggml-backend-reg.cpp b/ggml/src/ggml-backend-reg.cpp
22
--- a/llama.cpp/ggml/src/ggml-backend-reg.cpp
33
+++ b/llama.cpp/ggml/src/ggml-backend-reg.cpp
4-
@@ -478,7 +478,7 @@ static ggml_backend_reg_t ggml_backend_load_best(const char * name, bool silent,
4+
@@ -485,7 +485,7 @@ static ggml_backend_reg_t ggml_backend_load_best(const char * name, bool silent,
55
if (ec) {
66
GGML_LOG_DEBUG("%s: posix_stat(%s) failure, error-message: %s\n", __func__, path_str(search_path).c_str(), ec.message().c_str());
77
} else {

llama.cpp.patches/patches/ggml_src_ggml-cpu_repack.cpp.patch

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
diff --git a/ggml/src/ggml-cpu/repack.cpp b/ggml/src/ggml-cpu/repack.cpp
22
--- a/llama.cpp/ggml/src/ggml-cpu/repack.cpp
33
+++ b/llama.cpp/ggml/src/ggml-cpu/repack.cpp
4-
@@ -3521,14 +3521,14 @@ static const ggml::cpu::tensor_traits * ggml_repack_get_optimal_repack_type(cons
4+
@@ -4723,14 +4723,14 @@ static const ggml::cpu::tensor_traits * ggml_repack_get_optimal_repack_type(cons
55
return nullptr;
66
}
77

@@ -18,7 +18,7 @@ diff --git a/ggml/src/ggml-cpu/repack.cpp b/ggml/src/ggml-cpu/repack.cpp
1818
const void * data, size_t offset, size_t size) {
1919
GGML_ASSERT(offset == 0);
2020
GGML_ASSERT(size == ggml_nbytes(tensor));
21-
@@ -3540,13 +3540,13 @@ static void ggml_backend_cpu_repack_buffer_set_tensor(ggml_backend_buffer_t buff
21+
@@ -4742,13 +4742,13 @@ static void ggml_backend_cpu_repack_buffer_set_tensor(ggml_backend_buffer_t buff
2222
GGML_UNUSED(buffer);
2323
}
2424

@@ -34,7 +34,7 @@ diff --git a/ggml/src/ggml-cpu/repack.cpp b/ggml/src/ggml-cpu/repack.cpp
3434
ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(ggml_backend_cpu_buffer_type(), size);
3535

3636
if (buffer == nullptr) {
37-
@@ -3561,7 +3561,7 @@ static ggml_backend_buffer_t ggml_backend_cpu_repack_buffer_type_alloc_buffer(gg
37+
@@ -4763,7 +4763,7 @@ static ggml_backend_buffer_t ggml_backend_cpu_repack_buffer_type_alloc_buffer(gg
3838
return buffer;
3939
}
4040

0 commit comments

Comments
 (0)