Thanks for your great work!
I'm having some trouble when reproducing the results. Although trained as in the tutorial, there is still a big gap between the model I trained and the results given in the paper. Can you provide the necessary hyperparameters in order to reproduce the results? Such as batch size, epoch, learning rate, beam size, and so on. Thx!!!