[TorchToLinalg] `AtenMatmulOp` lowering doesn't promote to an accumulator type

We should align it with the likes of bmm, mm, conv, etc. Currently, a bf16 matmul does not accumulate to f32, which is almost certainly incorrect.