Incorrect active parameter count for MoE models

Hi, 

first of all thank you for the very interesting paper!

I believe there is a mistake in the code to calculate the FLOPs for Mixtral. Mixtral is a 8x7b MoE model with 2 active experts. 
Therefore, in the forward pass, only 1/4 of the non-embedding parameters are actually used for computation, yet the `model_size` dict (which is used in all flop calculations) contains the full number of ~47B tokens:
https://github.com/valentyn1boreiko/llm-threat-model/blob/3de6c6a146842e4ea780536a50c01fd2e28877a4/aggregate_csv.py#L12

This likely leads to inflated FLOP counts for results using the MoE model (e.g., PAIR, AutoDAN).

Best
Tim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect active parameter count for MoE models #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect active parameter count for MoE models #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions