Skip to content

Fix: TransformerEnginePrecision conversion for layers with bias=False #20805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

KAVYANSHTYAGI
Copy link

@KAVYANSHTYAGI KAVYANSHTYAGI commented May 9, 2025

What does this PR do?
This PR resolves a runtime AttributeError occurring in the TransformerEnginePrecision plugin during layer conversion.

When the convert_module() method replaces standard PyTorch layers (nn.Linear, nn.LayerNorm, etc.) with transformer engine counterparts, it attempts to clone both weight and bias tensors. However, some layers may be instantiated with bias=False, resulting in child.bias being None. The following line in the plugin causes a crash when this is the case:

replacement.bias.data = child.bias.data.clone()
This PR introduces a safe check to ensure both child.bias and replacement.bias are not None before attempting to access .data. This ensures compatibility with modules that omit bias, such as:

nn.Linear(in_features=16, out_features=32, bias=False)
Additionally, the PR includes a targeted unit test (test_convert_module_handles_linear_without_bias) that defines a model using such a bias-less layer and verifies that the conversion logic runs without error. This prevents regression and guards against similar issues in future updates.

The fix is surgical, backward-compatible, and maintains the expected behavior for all other layers with bias enabled.

Fixes #20803
cc @Lightning-AI/fabric-maintainers

Summary of changes
Guarded .bias.data.clone() with if child.bias is not None and replacement.bias is not None

Added test test_convert_module_handles_linear_without_bias in test_transformer_engine.py

-No breaking changes introduced.

Before submitting
Was this discussed/agreed via a GitHub issue? No

Did you read the contributor guideline, Pull Request section? Yes

Did you make sure your PR does only one thing, instead of bundling different changes together? Yes

Did you write any new necessary tests? Yes, for the edge case bias=False

Did you verify new and existing tests pass locally with your changes? Yes

Not applicable to update the CHANGELOG for this bugfix

🙃 Did you have fun coding?

Absolutely. Excited to contribute more improvements soon!


📚 Documentation preview 📚: https://pytorch-lightning--20805.org.readthedocs.build/en/20805/

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fabric lightning.fabric.Fabric
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LayerNorm with bias=False or elementwise_affine=False fails to convert when using transformer-engine
1 participant