Skip to content

Releases: mostlygeek/llama-swap

v143

24 Jul 15:36
8c693e7

Choose a tag to compare

Changelog

  • 8c693e7 Add endpoint aliases for reranking models (#201)

v142

23 Jul 22:23
8f2af26

Choose a tag to compare

Changelog

v141

23 Jul 06:12
01d4838

Choose a tag to compare

This release changes the way tokens/second are calculated on the activities page. The previous method was inaccurate because it divided the number of tokens generated by the total request time. The total request time also included prompt processing so the number was too misleading to be useful.

This release changes the logic to:

  • use llama-server's timings record if it exists for tokens/second
  • send a -1 when timings is not available. The UI will render this as "unknown".

Supporting timing information for other inference engines will be future PRs.

Token/Second and duration now match llama-server's output precisely:

image

Changelog

v140

22 Jul 06:08
cce0bc6

Choose a tag to compare

v140 includes a new feature to track chat completion activity in llama-swap.

Thanks to @g2mt for the contribution!

image

Changelog

  • cce0bc6 add guard to ensure ls-real-model-name is set in context
  • 36e2512 UI tidy [skip ci]
  • 9a54273 Update UI with new Activity event stream from #195
  • 87dce5f Add metrics logging for chat completion requests (#195)
  • 307e619 remove old eventsources from UI

v139

16 Jul 01:06
6299c1b

Choose a tag to compare

Changelog

v138

15 Jul 17:16
a906cd4

Choose a tag to compare

Changelog

  • a906cd4 Strip comments before macro expansion in config (#193)

v137

02 Jul 23:17
78b2bc3

Choose a tag to compare

Changelog

  • 78b2bc3 add toggle to hide/show unlisted models (#187)

v136

02 Jul 17:27
6a058e4

Choose a tag to compare

Changelog

  • 6a058e4 Change fsnotify to watch config directory instead of file

v135

02 Jul 05:20
1921e57

Choose a tag to compare

Changelog

v134

01 Jul 06:05
c867a6c

Choose a tag to compare

Changelog