For the pretrained tokenizer that is released, what is the size of the training set?
For the pretrained tokenizer that is released, what is the size of the training set?