Incorrect parameter counts: Claude 3.5 Sonnet listed with fewer parameters than Claude 3 Sonnet #134

cleophas-dlg · 2025-03-31T12:52:22Z

cleophas-dlg
Mar 31, 2025

Description

Hello,

I'm planning to use ecologits to compare utilization between Claude 3 and Claude 3.5, but I've noticed an inconsistency in the model configurations: https://github.com/genai-impact/ecologits/blob/main/ecologits/data/models.json

The file lists:

"claude-3-5-sonnet-latest" with 440B parameters (55 - 220B active)
"claude-3-sonnet-20240229" with 800B parameters (100 - 400B active)

While I understand these are estimates due to the unknown architecture, it's surprising that Claude 3 Sonnet, which is older, is listed with more parameters than Claude 3.5 Sonnet.

Could you please clarify or verify these parameter counts? It seems counterintuitive that the newer model would have fewer parameters.

Example Code

samuelrince · 2025-04-03T16:48:01Z

samuelrince
Apr 3, 2025
Maintainer

Hello @cleophas-dlg,

Thanks for the feedback. As you know, this is only based on “guesstimates” that we make on models and we try to improve the documentation on this. So just in case you have not seen, there is a page and a spreadsheet about the assumptions we use for proprietary models.

As detailed in the spreadsheet to estimate the architecture of Claude 3 Opus and Sonnet models, we compared to OpenAI GPT-4 and GPT-4-turbo. GPT-4 architecture was leaked and GPT-4-turbo is considered a “compressed” version of the previous model, so we used pricing differences to assess the downscaling factor.

By the time Claude 3.5 Sonnet was released, GPT-4o came out, and we saw they had comparable performance on benchmarks. GPT-4o is likely a smaller model than GPT-4-turbo so we used again the pricing difference at the release date to determine the architecture.

That’s why Claude 3.5 Sonnet is not the same architecture compared to Claude 3 Sonnet. We felt that it was fair to assume that Anthropic optimized the model the same way OpenAI did. Today, Claude 3.5/3.7 Sonnet and GPT-4o have comparable performance to models ranging from a Llama 3.1 405B and a DeepSeek V3 (671B and 685B for the last version).

So again, all this can be challenged, but that’s our approach to guessing the architecture of models.

I hope it is clearer for you! 🙏

PS: With that being said, I am curious, why comparing these two models? I would strongly recommend using one of the latest available like Claude 3.5 or 3.7 Sonnet over Claude 3 Sonnet! 😄

1 reply

cleophas-dlg Apr 4, 2025
Author

Hello @samuelrince, thank you for your answer. It was really interesting!

We wanted to compare Claude 3 and Claude 3.5 because both are available on AWS and we are currently using both, so we wanted to compare their energy consumption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parameter counts: Claude 3.5 Sonnet listed with fewer parameters than Claude 3 Sonnet #134

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Incorrect parameter counts: Claude 3.5 Sonnet listed with fewer parameters than Claude 3 Sonnet #134

Uh oh!

cleophas-dlg Mar 31, 2025

Description

Example Code

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

samuelrince Apr 3, 2025 Maintainer

Uh oh!

cleophas-dlg Apr 4, 2025 Author

cleophas-dlg
Mar 31, 2025

Replies: 1 comment 1 reply

samuelrince
Apr 3, 2025
Maintainer

cleophas-dlg Apr 4, 2025
Author