Sync master with upstream release b8933 by jan-service-account · Pull Request #497 · janhq/llama.cpp

jan-service-account · 2026-04-26T01:01:58Z

Updates dev branch with latest release (b8933) from ggml-org/llama.cpp

…ggml-org#22327) * Implement ssm_scan * Remove blocking in graph_compute and check for set rows * Fix bindings * Update op support

* opt arc770 for Q4_0 * add for Q4_0 * update the script * add help script for windows * update guide * fix format issue * convert from dos to unix for format issue * fix missed -sm parameter

* gitignore : add .pi + personal SYSTEM.md * cont : fix requirements heading in PR template * cont : shorten line

Change the default `ftype` in `llama_model_quantize_params` from `LLAMA_FTYPE_MOSTLY_Q5_1` to `LLAMA_FTYPE_MOSTLY_Q8_0`. In case some external program naively uses the default quantization params, we should probably default to a known-good type like Q8_0 rather than Q5_1, which is rather old.

…#20962) * Optimize Metal Tensor API usage for matmul2d Separates the Metal Tensor API (matmul2d) path in kernel_mul_mm into its own standalone kernel, gated by GGML_METAL_HAS_TENSOR. The legacy simdgroup_matrix kernel is preserved under #else. Previously both paths were interleaved via #ifdef blocks within a single kernel, forcing the tensor path to share the legacy kernel's data layout and threadgroup memory scheme. Splitting the kernel enabled memory and dispatch optimizations that weren't possible when the two paths shared code structure. * cont : cleanup * cont : cleanup * cont : cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* CUDA: reduce MMQ stream-k overhead * use 32 bit integers for kbc

* chat: fix handling of space in reasoning markers * fix tests * whitespace

reeselevine and others added 8 commits April 25, 2026 09:18

ggml-webgpu: support for SSM_SCAN and disable set_rows error checking (…

dd2914d

…ggml-org#22327) * Implement ssm_scan * Remove blocking in graph_compute and check for set rows * Fix bindings * Update op support

[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (ggml-org#22291)

eddd7a1

* opt arc770 for Q4_0 * add for Q4_0 * update the script * add help script for windows * update guide * fix format issue * convert from dos to unix for format issue * fix missed -sm parameter

gitignore : add .pi + personal SYSTEM.md (ggml-org#22316)

8ea8fee

* gitignore : add .pi + personal SYSTEM.md * cont : fix requirements heading in PR template * cont : shorten line

CUDA: reduce MMQ stream-k overhead (ggml-org#22298)

9725a31

* CUDA: reduce MMQ stream-k overhead * use 32 bit integers for kbc

spec : fix vocab compat checks (ggml-org#22358)

98dc141

chat: fix handling of space in reasoning markers (ggml-org#22353)

dcad77c

* chat: fix handling of space in reasoning markers * fix tests * whitespace

jan-service-account merged commit fa7c133 into dev Apr 26, 2026
9 checks passed

jan-service-account deleted the update-dev-from-master-2026-04-26-01-01 branch April 26, 2026 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b8933#497

Sync master with upstream release b8933#497
jan-service-account merged 8 commits into
devfrom
update-dev-from-master-2026-04-26-01-01

jan-service-account commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

jan-service-account commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants