Incorrect parameter counts: Claude 3.5 Sonnet listed with fewer parameters than Claude 3 Sonnet #134
Replies: 1 comment 1 reply
-
|
Hello @cleophas-dlg, Thanks for the feedback. As you know, this is only based on “guesstimates” that we make on models and we try to improve the documentation on this. So just in case you have not seen, there is a page and a spreadsheet about the assumptions we use for proprietary models. As detailed in the spreadsheet to estimate the architecture of Claude 3 Opus and Sonnet models, we compared to OpenAI GPT-4 and GPT-4-turbo. GPT-4 architecture was leaked and GPT-4-turbo is considered a “compressed” version of the previous model, so we used pricing differences to assess the downscaling factor. By the time Claude 3.5 Sonnet was released, GPT-4o came out, and we saw they had comparable performance on benchmarks. GPT-4o is likely a smaller model than GPT-4-turbo so we used again the pricing difference at the release date to determine the architecture. That’s why Claude 3.5 Sonnet is not the same architecture compared to Claude 3 Sonnet. We felt that it was fair to assume that Anthropic optimized the model the same way OpenAI did. Today, Claude 3.5/3.7 Sonnet and GPT-4o have comparable performance to models ranging from a Llama 3.1 405B and a DeepSeek V3 (671B and 685B for the last version). So again, all this can be challenged, but that’s our approach to guessing the architecture of models. I hope it is clearer for you! 🙏 PS: With that being said, I am curious, why comparing these two models? I would strongly recommend using one of the latest available like Claude 3.5 or 3.7 Sonnet over Claude 3 Sonnet! 😄 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
Hello,
I'm planning to use ecologits to compare utilization between Claude 3 and Claude 3.5, but I've noticed an inconsistency in the model configurations: https://github.com/genai-impact/ecologits/blob/main/ecologits/data/models.json
The file lists:
While I understand these are estimates due to the unknown architecture, it's surprising that Claude 3 Sonnet, which is older, is listed with more parameters than Claude 3.5 Sonnet.
Could you please clarify or verify these parameter counts? It seems counterintuitive that the newer model would have fewer parameters.
Example Code
Beta Was this translation helpful? Give feedback.
All reactions