Skip to content

Releases: ggml-org/llama.cpp

b4694

12 Feb 14:52
748ee9f
Compare
Choose a tag to compare
ggml : fix multi-threaded clamp_f32 (#11824)

* Bug fix for clamp_f32

When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.

* Bug fix for clamp_f32

* Bug fix for clamp_f32

b4692

12 Feb 13:38
c3d6af7
Compare
Choose a tag to compare
CUDA: fix CUDART_VERSION checks (#11821)

b4689

11 Feb 16:40
90e4dba
Compare
Choose a tag to compare
Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx…

b4688

11 Feb 13:49
a18f481
Compare
Choose a tag to compare
server : use common_token_to_piece instead of common_detokenize (#11740)

* server : use common_token_to_piece instead of common_detokenize

This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.

The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.

Resolves: https://github.com/ggerganov/llama.cpp/issues/11728

* squash! server : use common_token_to_piece instead of common_detokenize

Use common_token_to_piece for post_sampling_probs as well.

b4686

10 Feb 22:49
7b891bd
Compare
Choose a tag to compare
fix: typos in documentation files (#11791)

* Update ggml.c

* Update arg.cpp

* Update speculative.h

b4683

10 Feb 19:41
19b392d
Compare
Choose a tag to compare
llama-mmap: fix missing include (#11796)

Technically the fixed width types come only from iostream and
cstdint/stdint.h headers. memory and vector headers should not provide
these. In GCC 15 the headers are cleaned up and you require the proper
header cstdint.

src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type
   26 |     uint32_t read_u32() const;
      |     ^~~~~~~~

b4682

10 Feb 17:51
0893e01
Compare
Choose a tag to compare
server : correct signal handler (#11795)

b4681

10 Feb 10:11
d7b31a9
Compare
Choose a tag to compare
sync: minja (https://github.com/google/minja/commit/a72057e5190de2c61…

b4679

10 Feb 06:50
c2a67ef
Compare
Choose a tag to compare
vulkan: Make Vulkan optional at runtime (#11493). (#11494)

Co-authored-by: Jeff Bolz <[email protected]>

b4678

10 Feb 06:50
b044a0f
Compare
Choose a tag to compare
vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …