-
Notifications
You must be signed in to change notification settings - Fork 3.7k
NVIDIA Megatron-LM Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
Discussions
-
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16"
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION]Why does Megatron-LM using gloo backend when Creating Parrallel Group ?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION]which torch version can work with
stalering_exchange?No activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏