Releases · ggml-org/llama.cpp

12 Feb 14:52

748ee9f

b4694

ggml : fix multi-threaded clamp_f32 (#11824)

* Bug fix for clamp_f32

When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.

* Bug fix for clamp_f32

* Bug fix for clamp_f32

Assets 23

12 Feb 13:38

github-actions

b4692

c3d6af7

b4692

CUDA: fix CUDART_VERSION checks (#11821)

Assets 23

11 Feb 16:40

github-actions

b4689

90e4dba

b4689

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx…

Assets 23

11 Feb 13:49

github-actions

b4688

a18f481

b4688

server : use common_token_to_piece instead of common_detokenize (#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: https://github.com/ggerganov/llama.cpp/issues/11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.

Assets 23

10 Feb 22:49

github-actions

b4686

7b891bd

b4686

fix: typos in documentation files (#11791)

* Update ggml.c

* Update arg.cpp

* Update speculative.h

Assets 23

10 Feb 19:41

github-actions

b4683

19b392d

b4683

llama-mmap: fix missing include (#11796)

Technically the fixed width types come only from iostream and
cstdint/stdint.h headers. memory and vector headers should not provide
these. In GCC 15 the headers are cleaned up and you require the proper
header cstdint.

src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type
   26 |     uint32_t read_u32() const;
      |     ^~~~~~~~

Assets 23

10 Feb 17:51

github-actions

b4682

0893e01

b4682

server : correct signal handler (#11795)

Assets 23

10 Feb 10:11

github-actions

b4681

d7b31a9

b4681

sync: minja (https://github.com/google/minja/commit/a72057e5190de2c61…

Assets 23

10 Feb 06:50

github-actions

b4679

c2a67ef

b4679

vulkan: Make Vulkan optional at runtime (#11493). (#11494)

Co-authored-by: Jeff Bolz <[email protected]>

Assets 23

10 Feb 06:50

github-actions

b4678

b044a0f

b4678

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b4694

b4692

b4689

b4688

b4686

b4683

b4682

b4681

b4679

b4678