Releases · mostlygeek/llama-swap

02 Nov 17:44

github-actions

v0.0.9

63d4a7d

v0.0.9

Changelog

63d4a7d Improve LogMonitor to handle empty writes and ensure buffer immutability

Assets 7

01 Nov 22:29

github-actions

v0.0.8

f45469f

v0.0.8

Changelog

f45469f Merge pull request #8 from mostlygeek/improve-upstream-monitoring-issue5
34f9fd7 Improve timeout and exit handling of child processes. fix #3 and #5
8448efa revise health check logic to not error on 5 second timeout

Assets 7

31 Oct 19:23

github-actions

v0.0.7

8cf2a38

v0.0.7

Changelog

8cf2a38 Refactor log implementation
0f133f5 Add /logs endpoint to monitor upstream processes
1510b3f clean up README
0f8a8e7 add header image

Assets 7

21 Oct 22:52

github-actions

v0.0.6

6c38190

v0.0.6

This release adds v1/models endpoint to list the models in the configuration. Useful for web based interfaces that support picking a model to use.

Changelog

6c38190 Add compatibility with OpenAI /v1/models endpoint to list models

Assets 7

20 Oct 03:10

github-actions

v0.0.5

8580f0f

v0.0.5

This release supports multi line configuration for cmd.

This:

models:
  "qwen-32b":
    cmd: llama-server --host 127.0.0.1 --port 8999 -ngl 99 --flash-attn -sm row --metrics --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 80000 --model /mnt/models/Qwen2.5-32B-Instruct-Q8_0.gguf
    proxy: "http://127.0.0.1:8999"

Can now be written like this:

models:
  "qwen-32b":
    cmd: >
      /mnt/nvme/models/llama-server-66c2c9
      --host 127.0.0.1 --port 8999
      -ngl 99
      --flash-attn 
      -sm row 
      --cache-type-k q8_0 --cache-type-v q8_0
      --metrics 
      --ctx-size 80000
      --model /mnt/nvme/models/Qwen2.5-32B-Instruct-Q8_0.gguf
    proxy: "http://127.0.0.1:8999"

Changelog

8580f0f Merge pull request #6 from mostlygeek/multiline-config
be82d1a Support multiline cmds in YAML configuration

Assets 7

12 Oct 05:13

github-actions

v0.0.4

6cf0962

v0.0.4

This release adds support for configuring a custom endpoint to check when the upstream server is ready. No more llama.cpp server's /health endpoint hardcoded as a dependency. It should work now with anything that provides an OpenAI compatible API.

Changelog

6cf0962 Add custom check endpoint
8eb5b7b Add custom check endpoint
5a57688 add .vscode to .gitignore
b79b7ef add goreleaser config to limit GOOS and GOARCH builds

Assets 7

05 Oct 04:45

github-actions

v0.0.3

476086c

v0.0.3

Changelog

476086c Add Cmd.Wait() to prevent creation of zombie child processes see: #1
4fae7cf update docs

Assets 11

05 Oct 03:48

github-actions

v0.0.2

cc94425

v0.0.2

Changelog

cc94425 update README
ef05c05 renaming to llama-swap
ef8d002 release works?
5a4a41c add release thing
f992f7f Create go.yml
85743ad remove the v1/models endpoint, needs improvement
3e90f83 add /v1/models endpoint and proxy everything to llama-server
e0103d1 build simple-responder with make all
d682589 support environment variables
43119e8 add README
844615b rename to llamagate
aaca9d8 add Makefile
bfdba43 improve error handling
2d387cf rename proxy.go to manager.go
d061819 moved config into proxy package
7475bf0 .
4c2cc1c add license
8341543 move proxy logic into the proxy package
f44faf5 move config to its own package
cb576fb replace io.Copy to improve performance sending data to client
b63b81b first commit

Assets 11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Changelog

Changelog

Changelog

Changelog

Changelog

Changelog

Changelog

Releases: mostlygeek/llama-swap

v0.0.9

Changelog

v0.0.8

Changelog

v0.0.7

Changelog

v0.0.6

Changelog

v0.0.5

Changelog

v0.0.4

Changelog

v0.0.3

Changelog

v0.0.2

Changelog