You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inspired by what you've done with sendLoadingState, curious about using that same capability to report prompt processing progress.
As someone who's VRAM poor, running larger prompts even with efficient models takes a long time. Would be convenient when running inference in an app like Open WebUI to be able to see how far along the model's gotten with processing the prompt.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Inspired by what you've done with
sendLoadingState, curious about using that same capability to report prompt processing progress.As someone who's VRAM poor, running larger prompts even with efficient models takes a long time. Would be convenient when running inference in an app like Open WebUI to be able to see how far along the model's gotten with processing the prompt.
I see console data like this from
llama-server:Would it be possible to pipe the progress amounts back into the reasoning block at all?
Beta Was this translation helpful? Give feedback.
All reactions