There are two reasons to avoid using the usage_per_model field to compute request prices:
|
for usage in usage_per_model: |
|
point = await make_point( |
|
logger, |
|
deployment, |
|
usage["model"], |
|
project_id, |
|
None, |
|
None, |
|
user_hash, |
|
user_title, |
|
timestamp, |
|
request, |
|
response, |
|
type, |
|
usage, |
|
topic_model, |
|
rates_calculator, |
|
parent_deployment, |
|
trace, |
|
execution_path, |
|
) |
Unreliability of usage_per_model
chat_completion_response.statistics.usage_per_model is expected to contain a list of all model usages that were initiated directly or indirectly by the given request.
Currently, the population of this field is the responsibility of an application developer (which is not great - ideally DIAL Core should populate these fields, but this is how it is right now).
Therefore, it may not be provided at all, or it may give false information.
Double counting issue
The value of usage_per_model provides information about all transitive calls.
Therefore, nested calls may share the same usage_per_model. And if it's counted naively without considering this token sharing, we end up counting the same tokens twice.
E.g the token used by the following chain of calls:
app1 -> app2 -> app3 -> ... -> app(N-1) -> gpt-4
will be computed as app1.usage_per_model + app2.usage_per_model + ... + gpt-4.usage = N * gpt-4.usage
Whereas, it should simply be gpt-4.usage.
Therefore, we end up with N-times overestimation of tokens and price as well.
The solution is to simply avoid using usage_per_model in analytics.
DIAL Core already supplies the correct price of the request in the price and deployment_price fields.
There are two reasons to avoid using the
usage_per_modelfield to compute request prices:ai-dial-analytics-realtime/aidial_analytics_realtime/analytics.py
Lines 362 to 382 in 68ab942
Unreliability of usage_per_model
chat_completion_response.statistics.usage_per_modelis expected to contain a list of all model usages that were initiated directly or indirectly by the given request.Currently, the population of this field is the responsibility of an application developer (which is not great - ideally DIAL Core should populate these fields, but this is how it is right now).
Therefore, it may not be provided at all, or it may give false information.
Double counting issue
The value of
usage_per_modelprovides information about all transitive calls.Therefore, nested calls may share the same
usage_per_model. And if it's counted naively without considering this token sharing, we end up counting the same tokens twice.E.g the token used by the following chain of calls:
will be computed as
app1.usage_per_model + app2.usage_per_model + ... + gpt-4.usage = N * gpt-4.usageWhereas, it should simply be
gpt-4.usage.Therefore, we end up with
N-times overestimation of tokens and price as well.The solution is to simply avoid using
usage_per_modelin analytics.DIAL Core already supplies the correct price of the request in the
priceanddeployment_pricefields.