Important for stability! With 14336, use 0.00482 https://github.com/huggingface/llm_training_handbook/tree/main/instabilities#std-init
Important for stability!
With 14336, use 0.00482
https://github.com/huggingface/llm_training_handbook/tree/main/instabilities#std-init