Replies: 4 comments
-
|
I think we can already compute stats for input string length / output string length / time. Should be sufficient for tabby's optimization purpose? |
Beta Was this translation helpful? Give feedback.
-
|
@wsxiaoys yeah but as you know, |
Beta Was this translation helpful? Give feedback.
-
|
Yes - that's why I emphasis that |
Beta Was this translation helpful? Give feedback.
-
|
Is there a way (besides tabby server logs) to get a sense of inference performance (i.e. tokens/s) as part of the response for benchmarking? Since I am new to this space, can you please suggest other better alternatives to benchmark the server's performance? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
tabby/crates/tabby-inference/src/lib.rs
Lines 23 to 31 in 99d49a9
Current TextGeneration trait is simple, but it doesn't tell the statics we need for monitoring and optimizing the inference server.
for example,
input_token_lengthandoutput_token_lengthstats are really important to measure the inference server throughput.My initial idea would be something like this:
Beta Was this translation helpful? Give feedback.
All reactions