Skip to content

Releases: ggml-org/llama.cpp

b4735

17 Feb 13:49
73e2ed3
Compare
Choose a tag to compare
CUDA: use async data loading for FlashAttention (#11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <[email protected]>

b4734

17 Feb 11:54
f7b1116
Compare
Choose a tag to compare
update release requirements (#11897)

b4733

17 Feb 10:53
c4d29ba
Compare
Choose a tag to compare
server : fix divide-by-zero in metrics reporting (#11915)

b4732

17 Feb 07:26
2eea03d
Compare
Choose a tag to compare
vulkan: implement several ops relevant for ggml_opt (#11769)

* vulkan: support memset_tensor

* vulkan: support GGML_OP_SUM

* vulkan: implement GGML_OP_ARGMAX

* vulkan: implement GGML_OP_SUB

* vulkan: implement GGML_OP_COUNT_EQUAL

* vulkan: implement GGML_OP_OPT_STEP_ADAMW

* vulkan: fix check_results RWKV_WKV6 crash and memory leaks

* vulkan: implement GGML_OP_REPEAT_BACK

* tests: remove invalid test-backend-ops REPEAT_BACK tests

* vulkan: fix COUNT_EQUAL memset using a fillBuffer command

b4731

16 Feb 17:39
0f2bbe6
Compare
Choose a tag to compare
server : bump httplib to 0.19.0 (#11908)

b4730

16 Feb 10:18
fe163d5
Compare
Choose a tag to compare
common : Fix a typo in help (#11899)

This patch fixes a typo in command help.
prefx -> prefix

Signed-off-by: Masanari Iida <[email protected]>

b4728

16 Feb 08:18
bf42a23
Compare
Choose a tag to compare
vulkan: support multi/vision rope, and noncontiguous rope (#11902)

b4727

16 Feb 07:20
c2ea16f
Compare
Choose a tag to compare
metal : fix the crash caused by the lack of residency set support on …

b4724

15 Feb 19:09
2288510
Compare
Choose a tag to compare
metal : optimize dequant q6_K kernel (#11892)

b4722

15 Feb 15:14
68ff663
Compare
Choose a tag to compare
repo : update links to new url (#11886)

* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci