You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This release changes the way tokens/second are calculated on the activities page. The previous method was inaccurate because it divided the number of tokens generated by the total request time. The total request time also included prompt processing so the number was too misleading to be useful.
This release changes the logic to:
use llama-server's timings record if it exists for tokens/second
send a -1 when timings is not available. The UI will render this as "unknown".
Supporting timing information for other inference engines will be future PRs.
Token/Second and duration now match llama-server's output precisely: