Describe the bug
Currently magnitude, weight, flops metric does not extract the correct subset of weights for GQA attention. Also the tests for these modules are not randomized, ie we set all weights to ones, this wrongly makes the tests pass for magnitude, even when the subset of weights selected is incorrect
Describe the bug
Currently magnitude, weight, flops metric does not extract the correct subset of weights for GQA attention. Also the tests for these modules are not randomized, ie we set all weights to ones, this wrongly makes the tests pass for magnitude, even when the subset of weights selected is incorrect