-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the bug
Several modules may incorrectly detect TransformerEngine as available when the transformer_engine package is not installed.
The issue is caused by importing from megatron.core.extensions.transformer_engine before explicitly checking whether transformer_engine is installed.
Since megatron.core.extensions.transformer_engine falls back to MagicMock, the import can still succeed even when TransformerEngine is unavailable. As a result, HAVE_TE may be incorrectly set to True, which can enable TE-dependent logic in a no-TE environment.
Affected files include:
megatron/core/transformer/multi_latent_attention.pymegatron/core/transformer/moe/shared_experts.pyexamples/multimodal/layer_specs.pyexamples/multimodal/radio/radio_g.py
Steps/Code to reproduce bug
-
Prepare an environment where
transformer_engineis not installed. -
Run one of the following examples:
from megatron.core.transformer.multi_latent_attention import HAVE_TE print(HAVE_TE)
-
Observe that
HAVE_TEmay evaluate toTrueeven though thetransformer_enginepackage is not installed.
Expected behavior
HAVE_TE should be False when the transformer_engine package is not installed.
TE-dependent logic should only be enabled after Megatron-LM confirms that TransformerEngine is actually available.
Additional context
I have already opened a PR for this issue:
#3763