To have the same proportions of masked tokens to all token like in the paper for the pretraining. The paper maskes 400 of 512 tokens = 72 % We have 499. Therefore we nees 390 masked tokens.