Skip to content

Releases: ggml-org/llama.cpp

b3204

23 Jun 09:01
45c0e2e
Compare
Choose a tag to compare
Refactor Vulkan backend to allow multiple contexts (#7961)

* Refactor Vulkan backend to allow multiple contexts

* Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs

* Fix Vulkan debug build error

b3202

22 Jun 18:30
3e58b0e
Compare
Choose a tag to compare
cvector: fix CI + correct help message (#8064)

* cvector: fix CI + correct help message

* also correct --pca-iter

b3201

22 Jun 16:53
adf480c
Compare
Choose a tag to compare
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)…

b3199

22 Jun 14:52
5b48cd5
Compare
Choose a tag to compare
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

b3197

21 Jun 09:18
557b653
Compare
Choose a tag to compare
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…

b3195

21 Jun 07:38
a927b0f
Compare
Choose a tag to compare
llama : optimize long word tokenization with WPM (#8034)

ggml-ci

b3194

21 Jun 07:37
80ea089
Compare
Choose a tag to compare
llama : allow pooled embeddings on any model (#7477)

* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples

* find result_norm/result_embd tensors properly; update output allocation logic

* only use embd output for pooling_type NONE

* get rid of old causal_attn accessor

* take out attention_type; add in llama_set_embeddings

* bypass logits when doing non-NONE pooling

b3193

21 Jun 07:31
0e64591
Compare
Choose a tag to compare
swiftui : enable stream updating (#7754)

b3190

20 Jun 17:47
abd894a
Compare
Choose a tag to compare
common: fix warning (#8036)

* common: fix warning

* Update common/common.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b3189

20 Jun 17:25
de391e4
Compare
Choose a tag to compare
[SYCL] Fix windows build and inference (#8003)

* add sycl preset

* fix debug link error. fix windows crash

* update README