Releases: mostlygeek/llama-swap
Releases · mostlygeek/llama-swap
v0.0.9
v0.0.8
v0.0.7
v0.0.6
v0.0.5
This release supports multi line configuration for cmd
.
This:
models:
"qwen-32b":
cmd: llama-server --host 127.0.0.1 --port 8999 -ngl 99 --flash-attn -sm row --metrics --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 80000 --model /mnt/models/Qwen2.5-32B-Instruct-Q8_0.gguf
proxy: "http://127.0.0.1:8999"
Can now be written like this:
models:
"qwen-32b":
cmd: >
/mnt/nvme/models/llama-server-66c2c9
--host 127.0.0.1 --port 8999
-ngl 99
--flash-attn
-sm row
--cache-type-k q8_0 --cache-type-v q8_0
--metrics
--ctx-size 80000
--model /mnt/nvme/models/Qwen2.5-32B-Instruct-Q8_0.gguf
proxy: "http://127.0.0.1:8999"
Changelog
v0.0.4
This release adds support for configuring a custom endpoint to check when the upstream server is ready. No more llama.cpp server's /health
endpoint hardcoded as a dependency. It should work now with anything that provides an OpenAI compatible API.
Changelog
v0.0.3
v0.0.2
Changelog
- cc94425 update README
- ef05c05 renaming to llama-swap
- ef8d002 release works?
- 5a4a41c add release thing
- f992f7f Create go.yml
- 85743ad remove the v1/models endpoint, needs improvement
- 3e90f83 add /v1/models endpoint and proxy everything to llama-server
- e0103d1 build simple-responder with make all
- d682589 support environment variables
- 43119e8 add README
- 844615b rename to llamagate
- aaca9d8 add Makefile
- bfdba43 improve error handling
- 2d387cf rename proxy.go to manager.go
- d061819 moved config into proxy package
- 7475bf0 .
- 4c2cc1c add license
- 8341543 move proxy logic into the proxy package
- f44faf5 move config to its own package
- cb576fb replace io.Copy to improve performance sending data to client
- b63b81b first commit