Skip to content

Incorrect HAVE_TE detection when transformer_engine is not installed #3764

@returnL

Description

@returnL

Describe the bug

Several modules may incorrectly detect TransformerEngine as available when the transformer_engine package is not installed.

The issue is caused by importing from megatron.core.extensions.transformer_engine before explicitly checking whether transformer_engine is installed.

Since megatron.core.extensions.transformer_engine falls back to MagicMock, the import can still succeed even when TransformerEngine is unavailable. As a result, HAVE_TE may be incorrectly set to True, which can enable TE-dependent logic in a no-TE environment.

Affected files include:

  • megatron/core/transformer/multi_latent_attention.py
  • megatron/core/transformer/moe/shared_experts.py
  • examples/multimodal/layer_specs.py
  • examples/multimodal/radio/radio_g.py

Steps/Code to reproduce bug

  1. Prepare an environment where transformer_engine is not installed.

  2. Run one of the following examples:

    from megatron.core.transformer.multi_latent_attention import HAVE_TE
    print(HAVE_TE)
  3. Observe that HAVE_TE may evaluate to True even though the transformer_engine package is not installed.

Expected behavior

HAVE_TE should be False when the transformer_engine package is not installed.

TE-dependent logic should only be enabled after Megatron-LM confirms that TransformerEngine is actually available.

Additional context

I have already opened a PR for this issue:
#3763

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions