Skip to content

Releases: mostlygeek/llama-swap

v0.0.9

02 Nov 17:44
63d4a7d
Compare
Choose a tag to compare

Changelog

  • 63d4a7d Improve LogMonitor to handle empty writes and ensure buffer immutability

v0.0.8

01 Nov 22:29
f45469f
Compare
Choose a tag to compare

Changelog

  • f45469f Merge pull request #8 from mostlygeek/improve-upstream-monitoring-issue5
  • 34f9fd7 Improve timeout and exit handling of child processes. fix #3 and #5
  • 8448efa revise health check logic to not error on 5 second timeout

v0.0.7

31 Oct 19:23
8cf2a38
Compare
Choose a tag to compare

Changelog

  • 8cf2a38 Refactor log implementation
  • 0f133f5 Add /logs endpoint to monitor upstream processes
  • 1510b3f clean up README
  • 0f8a8e7 add header image

v0.0.6

21 Oct 22:52
6c38190
Compare
Choose a tag to compare

This release adds v1/models endpoint to list the models in the configuration. Useful for web based interfaces that support picking a model to use.

Changelog

  • 6c38190 Add compatibility with OpenAI /v1/models endpoint to list models

v0.0.5

20 Oct 03:10
8580f0f
Compare
Choose a tag to compare

This release supports multi line configuration for cmd.

This:

models:
  "qwen-32b":
    cmd: llama-server --host 127.0.0.1 --port 8999 -ngl 99 --flash-attn -sm row --metrics --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 80000 --model /mnt/models/Qwen2.5-32B-Instruct-Q8_0.gguf
    proxy: "http://127.0.0.1:8999"

Can now be written like this:

models:
  "qwen-32b":
    cmd: >
      /mnt/nvme/models/llama-server-66c2c9
      --host 127.0.0.1 --port 8999
      -ngl 99
      --flash-attn 
      -sm row 
      --cache-type-k q8_0 --cache-type-v q8_0
      --metrics 
      --ctx-size 80000
      --model /mnt/nvme/models/Qwen2.5-32B-Instruct-Q8_0.gguf
    proxy: "http://127.0.0.1:8999"

Changelog

  • 8580f0f Merge pull request #6 from mostlygeek/multiline-config
  • be82d1a Support multiline cmds in YAML configuration

v0.0.4

12 Oct 05:13
6cf0962
Compare
Choose a tag to compare

This release adds support for configuring a custom endpoint to check when the upstream server is ready. No more llama.cpp server's /health endpoint hardcoded as a dependency. It should work now with anything that provides an OpenAI compatible API.

Changelog

  • 6cf0962 Add custom check endpoint
  • 8eb5b7b Add custom check endpoint
  • 5a57688 add .vscode to .gitignore
  • b79b7ef add goreleaser config to limit GOOS and GOARCH builds

v0.0.3

05 Oct 04:45
476086c
Compare
Choose a tag to compare

Changelog

  • 476086c Add Cmd.Wait() to prevent creation of zombie child processes see: #1
  • 4fae7cf update docs

v0.0.2

05 Oct 03:48
cc94425
Compare
Choose a tag to compare

Changelog