Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4419
CUDA: add BF16 support (#11093) * CUDA: add BF16 support
b4418
Vulkan: Add device-specific blacklist for coopmat for the AMD proprie…
b4417
llama : Add support for DeepSeek V3 (#11049) * convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type * vocab : add DeepSeek V3 pre-tokenizer regexes * unicode : handle ACCENT_MARK and SYMBOL categories in regex * llama : add DeepSeek V3 chat template, handle new model parameters and tensor types --------- Co-authored-by: Stanisław Szymczyk <[email protected]>
b4416
[GGML][RPC] Support for models with non-512-aligned tensors over RPC.…
b4415
llama : add support for the cohere2 model architecture (#10900)
b4414
sync : ggml
b4411
fix: Vulkan shader gen binary path (#11037)
b4409
metal : avoid uint (#11019)
b4406
server : allow using LoRA adapters per-request (#10994) * slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4404
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>