Skip to content

Releases: ggml-org/llama.cpp

b3188

20 Jun 16:44
d50f889
Compare
Choose a tag to compare
CUDA: stream-k decomposition for MMQ (#8018)

* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices

b3187

20 Jun 06:10
2075a66
Compare
Choose a tag to compare
metal : fix `ggml_metal_supports_op` for BF16 (#8021)

Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.

b3186

20 Jun 00:42
ba58993
Compare
Choose a tag to compare
server : fix smart slot selection (#8020)

b3184

19 Jun 13:51
9c77ec1
Compare
Choose a tag to compare
ggml : synchronize threads using barriers (#7993)

b3183

19 Jun 10:58
a04a953
Compare
Choose a tag to compare
codecov : remove (#8004)

b3182

19 Jun 02:28
623494a
Compare
Choose a tag to compare
[SYCL] refactor (#6408)

* seperate lower precision GEMM from the main files

* fix workgroup size hardcode

b3181

18 Jun 17:30
37bef89
Compare
Choose a tag to compare
tokenizer : BPE fixes (#7530)

* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t

b3180

18 Jun 15:10
91c188d
Compare
Choose a tag to compare
Only use FIM middle token if it exists (#7648)

* Only use FIM middle if it exists

* Only use FIM middle if it exists

b3179

18 Jun 15:02
84f6de1
Compare
Choose a tag to compare
Fix no gcc pragma on Windows (#7751)

b3178

18 Jun 15:02
6166527
Compare
Choose a tag to compare
Allow compiling with CUDA without CUDA runtime installed (#7989)

On hosts which are not prepared/dedicated to execute code using CUDA
it is still possible to compile llama.cpp with CUDA support by just
installing the development packages.  Missing are the runtime
libraries like /usr/lib64/libcuda.so* and currently the link step
will fail.

The development environment is prepared for such situations.  There
are stub libraries for all the CUDA libraries available in the
$(CUDA_PATH)/lib64/stubs directory.  Adding this directory to the end
of the search path will not change anything for environments which
currently work fine but will enable compiling llama.cpp also in case
the runtime code is not available.