We use GRU with 2 layers.
A lot of this implementation was derived from https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html with some modifiations.
We use cross entropy loss instead of NLL loss.
We get an accuracy of 89% with validation error near ~0.6.