Description
Missing Functionality
When running queries to Azure OpenAI, the api always returns the number of tokens consumed (input tokens + output tokens) as shown below. This information is currently not returned by the evaluation package.
{
"body": {
....
"choices": [
{
"text": "es\n\nWhat do you call a mango who's in charge?\n\nThe head mango.",
"index": 0,
"finish_reason": "stop",
"logprobs": null
}
],
"usage": { ## !!! This is what I need !!!
"completion_tokens": 20,
"prompt_tokens": 6,
"total_tokens": 26
}
}
}
Why is this necessary ?
To ensure latency for solution in production, customers need to estimate the PTUs required taking into account not only the number of requests but also how much the evaluation process will consume.
Currently to estimate the total number of tokens, the only way is to manually load the prompt files for the different metrics (relevancy, roundedness) and use tiktoken to estimate the number of tokens. The same needs to be done for the answer and reasoning in the evaluation's response.
Expected Behaviour
In the response of each evaluation request, I'd expect an additional field ("usage" like the AOAI's case) that lists the number of token used.