You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After finally catching up on the "Alive and Kicking" interview - I remembered I'd suggested being able to get the costings for a response.
The litellm project maintain a huge json blob with the token costs for a massive number of models & providers. So I wondered if it was worth 'borrowing' how they use it so we could do something like $response->costs(), $response->costs()->input. Something like that anyway.
It's very handy when comparing model responses to get an idea of time vs. quality vs. price. Especially when processing a lot of long documents and seeing how input vs. output compare - what sort of cache-hit price reduction you're getting from different model providers.
If it was of any interest I could dig into the original python code to see how they process it.
(Edit: There's also models.dev which opencode uses).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
After finally catching up on the "Alive and Kicking" interview - I remembered I'd suggested being able to get the costings for a response.
The litellm project maintain a huge json blob with the token costs for a massive number of models & providers. So I wondered if it was worth 'borrowing' how they use it so we could do something like
$response->costs(),$response->costs()->input. Something like that anyway.It's very handy when comparing model responses to get an idea of time vs. quality vs. price. Especially when processing a lot of long documents and seeing how input vs. output compare - what sort of cache-hit price reduction you're getting from different model providers.
If it was of any interest I could dig into the original python code to see how they process it.
(Edit: There's also models.dev which opencode uses).
Beta Was this translation helpful? Give feedback.
All reactions