feat: add generation stats to `TextGeneration` trait #860

erfanium · 2023-11-21T14:53:04Z

erfanium
Nov 21, 2023

Lines 23 to 31 in 99d49a9

    
           #[async_trait] 
        
           pub trait TextGeneration: Sync + Send { 
        
               async fn generate(&self, prompt: &str, options: TextGenerationOptions) -> String; 
        
               async fn generate_stream( 
        
                   &self, 
        
                   prompt: &str, 
        
                   options: TextGenerationOptions, 
        
               ) -> BoxStream<String>; 
        
           }

Current TextGeneration trait is simple, but it doesn't tell the statics we need for monitoring and optimizing the inference server.

for example, input_token_length and output_token_length stats are really important to measure the inference server throughput.

My initial idea would be something like this:

pub struct TextGenerationStat {
    pub prompt_tokens_length: u32,
    pub output_tokens_length: Option<u32>, // not available in stream responses
    pub time: Option<u32>, // not available in stream responses
}

pub struct TextGenerationResponse {
    pub text: String,
    pub stat: Option<TextGenerationStat>,
}

pub struct TextGenerationStreamResponse {
    pub stream: BoxStream<String>,
    pub stat: Option<TextGenerationStat>,
}

pub trait TextGeneration: Sync + Send {
    async fn generate(
        &self,
        prompt: &str,
        options: TextGenerationOptions,
    ) -> TextGenerationResponse;
    async fn generate_stream(
        &self,
        prompt: &str,
        options: TextGenerationOptions,
    ) -> TextGenerationStreamResponse;
}

wsxiaoys · 2023-11-21T23:21:34Z

wsxiaoys
Nov 21, 2023
Maintainer

I think we can already compute stats for input string length / output string length / time. Should be sufficient for tabby's optimization purpose?

0 replies

erfanium · 2023-11-21T23:42:55Z

erfanium
Nov 21, 2023
Author

@wsxiaoys yeah but as you know, input string length != input token length. metrics based on string length is not standard.

0 replies

wsxiaoys · 2023-11-21T23:45:51Z

wsxiaoys
Nov 21, 2023
Maintainer

Yes - that's why I emphasis that for tabby's optimization purpose :).

0 replies

sundaraa-deshaw · 2023-11-23T05:11:27Z

sundaraa-deshaw
Nov 23, 2023

Is there a way (besides tabby server logs) to get a sense of inference performance (i.e. tokens/s) as part of the response for benchmarking? Since I am new to this space, can you please suggest other better alternatives to benchmark the server's performance?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add generation stats to `TextGeneration` trait #860

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

feat: add generation stats to TextGeneration trait #860

Uh oh!

Uh oh!

erfanium Nov 21, 2023

Replies: 4 comments

Uh oh!

wsxiaoys Nov 21, 2023 Maintainer

Uh oh!

erfanium Nov 21, 2023 Author

Uh oh!

wsxiaoys Nov 21, 2023 Maintainer

Uh oh!

sundaraa-deshaw Nov 23, 2023

feat: add generation stats to `TextGeneration` trait #860

erfanium
Nov 21, 2023

wsxiaoys
Nov 21, 2023
Maintainer

erfanium
Nov 21, 2023
Author

wsxiaoys
Nov 21, 2023
Maintainer

sundaraa-deshaw
Nov 23, 2023