TODO: implement the stochastic BERT and the dataloaders # Hande, Raul implement the training loop # Trung