Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4735
CUDA: use async data loading for FlashAttention (#11894) * CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <[email protected]>
b4734
update release requirements (#11897)
b4733
server : fix divide-by-zero in metrics reporting (#11915)
b4732
vulkan: implement several ops relevant for ggml_opt (#11769) * vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command
b4731
server : bump httplib to 0.19.0 (#11908)
b4730
common : Fix a typo in help (#11899) This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <[email protected]>
b4728
vulkan: support multi/vision rope, and noncontiguous rope (#11902)
b4727
metal : fix the crash caused by the lack of residency set support on …
b4724
metal : optimize dequant q6_K kernel (#11892)
b4722
repo : update links to new url (#11886) * repo : update links to new url ggml-ci * cont : more urls ggml-ci