Commit 595e448
authored
docs(llama.cpp): note tensor split now works with quantized KV cache (#10135)
The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.
Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.
Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>1 parent 860f9d6 commit 595e448
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
516 | 516 | | |
517 | 517 | | |
518 | 518 | | |
519 | | - | |
| 519 | + | |
520 | 520 | | |
521 | 521 | | |
522 | 522 | | |
| |||
0 commit comments