Your question
The RotaryEmbedding of Megatron is of nn.Module type.
If the entire model is cast to another type by to(torch.bfloat16), the data type of inv_freq will change accordingly.
However, maintaining float32 in subsequent sin/cos calculations seems to be a wise choice.
Is this a potential precision issue that could lead to unnecessary calculation errors?