Skip to content

Sync master with upstream release b8495#463

Merged
jan-service-account merged 20 commits into
devfrom
update-dev-from-master-2026-03-24-00-43
Mar 24, 2026
Merged

Sync master with upstream release b8495#463
jan-service-account merged 20 commits into
devfrom
update-dev-from-master-2026-03-24-00-43

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b8495) from ggml-org/llama.cpp

DorianRudolph and others added 20 commits March 23, 2026 01:04
…del (ggml-org#20847)

* added support for internvl's dynamic high-resolution (Qianfan-OCR needed)

* add min/max dynamic patch to gguf meta

* clean up

* simplified handling min/max dynamic patch

* reuse llava_uhd logic for slice images

* provide default values for older models

* flake8

* prevent writing 0 value to gguf

* remove duplicated resolution candidates with a better algorithm

* fix indentation

* format

* add protection from divide by zero

* change to 0 to be safe

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
…#20857)

* fix(openvino): explicit memset in buffer_context allocation

* minor

---------

Co-authored-by: Dan Hoffman <dhoffman@cyket.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ACL graph capture disallows host-to-device memcpy and device memory
malloc/free on the captured stream. Pre-load the RoPE cache before
capture so that:
- Host-to-device copies and allocations run on the non-captured stream
- Cache metadata is populated and memory pool is warmed up
- During capture, only on-device computations are recorded; host-side
  and allocation branches are skipped
* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* metal:add conv_3d backend

Rebased with master and resolved conflicts.

* Resolved issues related to changes in variable names

* kernel void kernel_upscale_bilinear_f32 was missing in my branch, added back, should pass all tests now

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…#20823)

* webui: fix --webui-config-file settings not applied on load

* chore: update webui build output
* server: use httplib dynamic threads

* change to n_threads_http + 1024
Tested to verify - the typo is just in the docs, not the actual flag.
* contrib: add "Requirements" section to PR template

* typo [no ci]

* use h2, add "Additional information"

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code

* opencl: add q6_K transpose

* opencl: fix cvt kernel name

* opencl: add call to q6_K gemv

* opencl: fix q6_K scale transpose

* opencl: fix loading for gemv q6_K, refactor

* opencl: fix transpose_8_buf kernel assignment, refactor

* opencl: refactor q6_K transpose

* opencl: add gemm_noshuffle_q6_k_f32

* opencl: fix qh loading

* opencl: refactor q6_K gemv host side, release bufs and imgs

* opencl: refactor

* opencl: fix q6_K dequant and scale selection

* opencl: workaround compiler bug, fix dump_tensor

* opencl: refactor q6_K convert kernels

* opencl: unpack transformed q6_K in get_tensor

* opencl: refactor, handle non-uniform workgroups

* opencl: support non-vector subgroup bcast
…0915)

* Add codeowners for scripts/snapdragon

* Also add docs/backends/snapdragon
…20918)

* hex-dma: make chained dma the default to handle newer models

This also includes some new instrumentation that we can remove later.

* hexagon: add uint32 dump helper

* hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv

ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset
spans page boundaries.

* hexagon: update ssm-conv to make base-addr compute a bit easier to read

* hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB)

* hex-bin: fix incorrect stride logic

* hexagon: make sure repack buffs are dumped for verbose > 2

* hex-bin: consistently use dma_queue_push even for dummy dst transactions

* hex-dma: start using 2d-wide mode on v75 and up

The removes the need to deal with the 16-bit limitaion for the strides.

* hex-bin: cleanup kernel selection logic

* hex-bin: cleanup binary op core and fix transposed tensor handling

* snapdragon: update run-bench to use larger ubatch and fa-on
@jan-service-account jan-service-account merged commit b61c2f5 into dev Mar 24, 2026
3 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-03-24-00-43 branch March 24, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.