Sync master with upstream release b8495 by jan-service-account · Pull Request #463 · janhq/llama.cpp

jan-service-account · 2026-03-24T00:43:19Z

Updates dev branch with latest release (b8495) from ggml-org/llama.cpp

…del (ggml-org#20847) * added support for internvl's dynamic high-resolution (Qianfan-OCR needed) * add min/max dynamic patch to gguf meta * clean up * simplified handling min/max dynamic patch * reuse llava_uhd logic for slice images * provide default values for older models * flake8 * prevent writing 0 value to gguf * remove duplicated resolution candidates with a better algorithm * fix indentation * format * add protection from divide by zero * change to 0 to be safe --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

…#20857) * fix(openvino): explicit memset in buffer_context allocation * minor --------- Co-authored-by: Dan Hoffman <dhoffman@cyket.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ACL graph capture disallows host-to-device memcpy and device memory malloc/free on the captured stream. Pre-load the RoPE cache before capture so that: - Host-to-device copies and allocations run on the non-captured stream - Cache metadata is populated and memory pool is warmed up - During capture, only on-device computations are recorded; host-side and allocation branches are skipped

…nges system prompt (ggml-org#20859)

* Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * metal:add conv_3d backend Rebased with master and resolved conflicts. * Resolved issues related to changes in variable names * kernel void kernel_upscale_bilinear_f32 was missing in my branch, added back, should pass all tests now --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…#20823) * webui: fix --webui-config-file settings not applied on load * chore: update webui build output

* server: use httplib dynamic threads * change to n_threads_http + 1024

…() (ggml-org#20887)

Tested to verify - the typo is just in the docs, not the actual flag.

* contrib: add "Requirements" section to PR template * typo [no ci] * use h2, add "Additional information" --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

* opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code * opencl: add q6_K transpose * opencl: fix cvt kernel name * opencl: add call to q6_K gemv * opencl: fix q6_K scale transpose * opencl: fix loading for gemv q6_K, refactor * opencl: fix transpose_8_buf kernel assignment, refactor * opencl: refactor q6_K transpose * opencl: add gemm_noshuffle_q6_k_f32 * opencl: fix qh loading * opencl: refactor q6_K gemv host side, release bufs and imgs * opencl: refactor * opencl: fix q6_K dequant and scale selection * opencl: workaround compiler bug, fix dump_tensor * opencl: refactor q6_K convert kernels * opencl: unpack transformed q6_K in get_tensor * opencl: refactor, handle non-uniform workgroups * opencl: support non-vector subgroup bcast

…0915) * Add codeowners for scripts/snapdragon * Also add docs/backends/snapdragon

…20918) * hex-dma: make chained dma the default to handle newer models This also includes some new instrumentation that we can remove later. * hexagon: add uint32 dump helper * hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset spans page boundaries. * hexagon: update ssm-conv to make base-addr compute a bit easier to read * hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB) * hex-bin: fix incorrect stride logic * hexagon: make sure repack buffs are dumped for verbose > 2 * hex-bin: consistently use dma_queue_push even for dummy dst transactions * hex-dma: start using 2d-wide mode on v75 and up The removes the need to deal with the 16-bit limitaion for the strides. * hex-bin: cleanup kernel selection logic * hex-bin: cleanup binary op core and fix transposed tensor handling * snapdragon: update run-bench to use larger ubatch and fa-on

DorianRudolph and others added 20 commits March 23, 2026 01:04

mtmd : fix LightOnOCR image preprocessing (ggml-org#20877)

d3ac030

opencl: add flattened Q4_K mv and general Q4_K mm (ggml-org#20773)

84ffd0c

fix(openvino): explicit memset in buffer_context allocation (ggml-org…

cc18f96

…#20857) * fix(openvino): explicit memset in buffer_context allocation * minor --------- Co-authored-by: Dan Hoffman <dhoffman@cyket.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

common/autoparser : detect reasoning markers when enable_thinking cha…

7a0b6a6

…nges system prompt (ggml-org#20859)

webui: fix --webui-config-file settings not applied on load (ggml-org…

c44a932

…#20823) * webui: fix --webui-config-file settings not applied on load * chore: update webui build output

ai : update gh permissions (ggml-org#20895)

e32d243

server: use httplib dynamic threads (ggml-org#20817)

31a5cf4

* server: use httplib dynamic threads * change to n_threads_http + 1024

docs : rerun llama-gen-docs to include new CLI args (ggml-org#20892)

841bc20

memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta…

f93c09e

…() (ggml-org#20887)

docs: Fix typo in reasoning flag documentation (ggml-org#20780)

35b662b

Tested to verify - the typo is just in the docs, not the actual flag.

webui: Improve chat form positioning (ggml-org#20901)

11fb11b

devops: upgraded default oneAPI version (ggml-org#20731)

fd18364

contrib: add "Requirements" section to PR template (ggml-org#20841)

bd69921

* contrib: add "Requirements" section to PR template * typo [no ci] * use h2, add "Additional information" --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

rpc : RCE patch (ggml-org#20908)

39bf0d3

Add codeowners for scripts/snapdragon and docs/snapdragon (ggml-org#2…

1fb2290

…0915) * Add codeowners for scripts/snapdragon * Also add docs/backends/snapdragon

jan-service-account merged commit b61c2f5 into dev Mar 24, 2026
3 checks passed

jan-service-account deleted the update-dev-from-master-2026-03-24-00-43 branch March 24, 2026 00:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b8495#463

Sync master with upstream release b8495#463
jan-service-account merged 20 commits into
devfrom
update-dev-from-master-2026-03-24-00-43

jan-service-account commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Uh oh!

Conversation

jan-service-account commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants