Use bucketized model averaging for LocalSGD

We want to implement the bucketization from https://github.com/pytorch/pytorch/blob/main/torch/distributed/algorithms/model_averaging/utils.py#L22-L50

Instead of our naive per parameter reduction here: https://github.com/pytorch/torchft/blob/main/torchft/local_sgd.py#L179-L181