Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b3204
Refactor Vulkan backend to allow multiple contexts (#7961) * Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error
b3202
cvector: fix CI + correct help message (#8064) * cvector: fix CI + correct help message * also correct --pca-iter
b3201
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)…
b3199
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…
b3197
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…
b3195
llama : optimize long word tokenization with WPM (#8034) ggml-ci
b3194
llama : allow pooled embeddings on any model (#7477) * create append_pooling operation; allow to specify attention_type; add last token pooling; update examples * find result_norm/result_embd tensors properly; update output allocation logic * only use embd output for pooling_type NONE * get rid of old causal_attn accessor * take out attention_type; add in llama_set_embeddings * bypass logits when doing non-NONE pooling
b3193
swiftui : enable stream updating (#7754)
b3190
common: fix warning (#8036) * common: fix warning * Update common/common.cpp Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b3189
[SYCL] Fix windows build and inference (#8003) * add sycl preset * fix debug link error. fix windows crash * update README