Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b3188
CUDA: stream-k decomposition for MMQ (#8018) * CUDA: stream-k decomposition for MMQ * fix undefined memory reads for small matrices
b3187
metal : fix `ggml_metal_supports_op` for BF16 (#8021) Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.
b3186
server : fix smart slot selection (#8020)
b3184
ggml : synchronize threads using barriers (#7993)
b3183
codecov : remove (#8004)
b3182
[SYCL] refactor (#6408) * seperate lower precision GEMM from the main files * fix workgroup size hardcode
b3181
tokenizer : BPE fixes (#7530) * Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t
b3180
Only use FIM middle token if it exists (#7648) * Only use FIM middle if it exists * Only use FIM middle if it exists
b3179
Fix no gcc pragma on Windows (#7751)
b3178
Allow compiling with CUDA without CUDA runtime installed (#7989) On hosts which are not prepared/dedicated to execute code using CUDA it is still possible to compile llama.cpp with CUDA support by just installing the development packages. Missing are the runtime libraries like /usr/lib64/libcuda.so* and currently the link step will fail. The development environment is prepared for such situations. There are stub libraries for all the CUDA libraries available in the $(CUDA_PATH)/lib64/stubs directory. Adding this directory to the end of the search path will not change anything for environments which currently work fine but will enable compiling llama.cpp also in case the runtime code is not available.