Releases · ggml-org/llama.cpp

23 Jun 09:01

45c0e2e

b3204

Refactor Vulkan backend to allow multiple contexts (#7961)

* Refactor Vulkan backend to allow multiple contexts

* Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs

* Fix Vulkan debug build error

Assets 20

22 Jun 18:30

github-actions

b3202

3e58b0e

b3202

cvector: fix CI + correct help message (#8064)

* cvector: fix CI + correct help message

* also correct --pca-iter

Assets 20

22 Jun 16:53

github-actions

b3201

adf480c

b3201

cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)…

Assets 20

22 Jun 14:52

github-actions

b3199

5b48cd5

b3199

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

Assets 20

21 Jun 09:18

github-actions

b3197

557b653

b3197

vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…

Assets 20

21 Jun 07:38

github-actions

b3195

a927b0f

b3195

llama : optimize long word tokenization with WPM (#8034)

ggml-ci

Assets 20

21 Jun 07:37

github-actions

b3194

80ea089

b3194

llama : allow pooled embeddings on any model (#7477)

* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples

* find result_norm/result_embd tensors properly; update output allocation logic

* only use embd output for pooling_type NONE

* get rid of old causal_attn accessor

* take out attention_type; add in llama_set_embeddings

* bypass logits when doing non-NONE pooling

Assets 20

21 Jun 07:31

github-actions

b3193

0e64591

b3193

swiftui : enable stream updating (#7754)

Assets 20

20 Jun 17:47

github-actions

b3190

abd894a

b3190

common: fix warning (#8036)

* common: fix warning

* Update common/common.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 20

20 Jun 17:25

github-actions

b3189

de391e4

b3189

[SYCL] Fix windows build and inference (#8003)

* add sycl preset

* fix debug link error. fix windows crash

* update README

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b3204

b3202

b3201

b3199

b3197

b3195

b3194

b3193

b3190

b3189