Skip to content

Releases: mostlygeek/llama-swap

v173

17 Nov 18:49
86e9b93

Choose a tag to compare

This release includes a set of quality of life features for configuring and using llama-swap:

  • add JSON schema for configuration file (#393)
  • build, commit and version information in the UI (#395)
    • image
  • enable model aliases in v1/models (#400)
  • logTimeFormat: enable and set timestamps in log output for the proxy (#401)

Shout out to @ryan-steed-usa and @nint8835 for their contributions to this release.

Changelog

  • 86e9b93 proxy,ui: add version endpoint and display version info in UI (#395)
  • 3acace8 proxy: add configurable logging timestamp format (#401)
  • 554d29e feat: enhance model listing to include aliases (#400)
  • 3567b7d Update image in README.md for web UI section
  • 3873852 config.example.yaml: add modeline for schema validation
  • c0fc858 Add configuration file JSON schema (#393)
  • b429349 add /ui/ to wol-proxy polling (#388)
  • eab2efd feat: improve llama.cpp base image tag for cpu (#391)
  • 6aedbe1 cmd/wol-proxy: show a loading page for / (#381)
  • b24467a fix: update containerfile user/group management commands (#379)

v172

03 Nov 13:33
12b69fb

Choose a tag to compare

Changelog

  • 12b69fb proxy: recover from panic in Process.statusUpdate (#378)
  • f91a8b2 refactor: update Containerfile to support non-root user execution and improve security (#368)

v171

29 Oct 07:12
a89b803

Choose a tag to compare

This release includes a unique feature to show model loading progress in the Reasoning content. When enabled in the config llama-swap will stream a bit of data so there is no silence when waiting for the model to swap and load.

  • Add a new global config setting: sendLoadingState: true
  • Add a new model override setting: model.sendLoadingState: true to control it on per model basis

Demo:

llama-swap-issue-366.mp4

Thanks to @ServeurpersoCom for the very cool idea!

Changelog

  • a89b803 Stream loading state when swapping models (#371)

v170

26 Oct 03:44
f852689

Choose a tag to compare

Fix a bug where a panic() can cause llama-swap to lock up or exit. Recommended update.

Changelog

  • f852689 proxy: add panic recovery to Process.ProxyRequest (#363)

v169

26 Oct 00:41
e250e71

Choose a tag to compare

This update adds usage tracking for API calls made to POST /upstream/{model}/{api}. Now, chats in the llama-server UI show up in the Activities tab. Any request to this endpoint that includes usage or timing info will appear there (infill, embeddings, etc).

Changelog

  • e250e71 Include metrics from upstream chat requests (#361)
  • d18dc26 cmd/wol-proxy: tweak logs to show what is causing wake ups (#356)

v168

24 Oct 05:25
8357714

Choose a tag to compare

Changelog

  • 8357714 ui: fix avg token/sec calculation on models page (#357)

Averages were replaced with percentiles and a histogram:

image

v167

21 Oct 03:57
c07179d

Choose a tag to compare

This release adds cmd/wol-proxy, a Wake-on-LAN proxy for llama-swap. If llama-swap lives on a high idle wattage server that suspends after an idle period, wol-proxy will automatically wake that server up and then reverse proxy the requests.

A niche use case but hopefully it will save a lot of wasted energy from idle GPUs.

Changelog

  • c07179d cmd/wol-proxy: add wol-proxy (#352)
  • 7ff5063 Update README for setup instructions clarity [skip ci]
  • 9fc0431 Clean up and Documentation (#347) [skip ci]

v166

16 Oct 02:35
6516532

Choose a tag to compare

This release includes support for TLS certificates from contributor @dwrz!

To use it:

./llama-swap --tls-cert-file /path/to/cert.pem --tls-key-file /path/to/key.pem ...

Generating a self-signed certificate:

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

Changelog

v165

11 Oct 19:19
5392783

Choose a tag to compare

Changelog

v164

07 Oct 06:00
00b738c

Choose a tag to compare

Changelog