Fix torch.compile breaking toggle_optimizer / untoggle_optimizer (#21686)

gaurav0107 · deependujha · web-flow · commit 1120456eef88 · 2026-06-01T12:32:33.000+01:00
* Fix toggle_optimizer breaking under torch.compile (#21513) `LightningModule.toggle_optimizer` and `untoggle_optimizer` mutate `requires_grad` on parameters to implement multi-optimizer gradient masking. Dynamo/AOTAutograd does not support `setattr()` on `Tensor.requires_grad` because it can change a tensor's leaf-ness mid-graph, so when the `LightningModule` is wrapped with `torch.compile` tracing either graph-breaks with "Unsupported: setattr() on Tensor.requires_grad" or raises a `KeyError` on the internal `param_requires_grad_state` mapping when the traced parameter references diverge from those held by `trainer.optimizers`. Decorate both helpers with `@torch.compiler.disable` (the same pattern already used for logging bookkeeping in `logger_connector/result.py`) so they run as opaque Python when called from a compiled `training_step`. Eager behavior is unchanged. Adds a CPU regression test that compiles a two-optimizer `LightningModule` calling `toggle_optimizer` / `untoggle_optimizer` in `training_step` and exercises one training iteration, plus a CHANGELOG entry. * Narrow test_toggle_untoggle to check compiler.disable attribute (#21513) The previous regression test compiled a `LightningModule` end-to-end and called `self.optimizers()` inside the compiled `training_step`, which unrelated to the toggle_optimizer fix trips a separate Dynamo limitation: tracing `self.trainer.strategy._lightning_optimizers` raises `InternalTorchDynamoError: GetAttrVariable(...) has no type` across all CI platforms and torch versions. The shipped fix — `@torch.compiler.disable` on `toggle_optimizer` / `untoggle_optimizer` — does not require a full compiled trainer run to verify; it only guarantees Dynamo skips those two methods. Replace the integration test with a direct attribute check that both methods carry the `_torchdynamo_disable` marker installed by `torch.compiler.disable`, following the same `has_dynamo(fn)` pattern already used by `tests/utilities/test_compile.py::test_compile_uncompile`. Toggle/untoggle functional correctness remains covered by the existing `test_toggle_untoggle_2_optimizers_no_shared_parameters` and `test_toggle_untoggle_3_optimizers_shared_parameters` tests in this file. --------- Co-authored-by: Deependu <deependujha21@gmail.com>
diff --git a/src/lightning/pytorch/CHANGELOG.md b/src/lightning/pytorch/CHANGELOG.md
@@ -28,6 +28,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 - Fixed `SIGTERMException` producing a zero exit code instead of 143 (128 + SIGTERM) ([#21623](https://github.com/Lightning-AI/pytorch-lightning/issues/21623))
 
+- Fixed `LightningModule.toggle_optimizer` / `untoggle_optimizer` breaking under `torch.compile` by disabling Dynamo tracing on these bookkeeping helpers ([#21513](https://github.com/Lightning-AI/pytorch-lightning/issues/21513))
+
 ---
 
 ## [2.6.4] - 2026-05-20
diff --git a/src/lightning/pytorch/core/module.py b/src/lightning/pytorch/core/module.py
@@ -1136,12 +1136,24 @@ def backward(self, loss):
         else:
             loss.backward(*args, **kwargs)
 
+    @torch.compiler.disable
     def toggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> None:
         """Makes sure only the gradients of the current optimizer's parameters are calculated in the training step to
         prevent dangling gradients in multiple-optimizer setup.
 
         It works with :meth:`untoggle_optimizer` to make sure ``param_requires_grad_state`` is properly reset.
 
+        .. note::
+            This method is decorated with :func:`torch.compiler.disable` so that it is executed as regular
+            Python when the ``LightningModule`` is wrapped with :func:`torch.compile`. Mutating
+            ``requires_grad`` on parameters is not supported by Dynamo/AOTAutograd (it can change a
+            tensor's leaf-ness mid-graph), so tracing this bookkeeping helper would either fail with
+            ``Unsupported: setattr() on Tensor.requires_grad`` or produce a ``KeyError`` on the
+            internal ``param_requires_grad_state`` mapping when the traced parameter references diverge
+            from those held by ``trainer.optimizers``. Disabling the compiler on this method keeps the
+            behavior identical for eager users while making it safe to call from a compiled
+            ``training_step``.
+
         Args:
             optimizer: The optimizer to toggle.
 
@@ -1165,9 +1177,13 @@ def toggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> N
                 param.requires_grad = param_requires_grad_state[param]
         self._param_requires_grad_state = param_requires_grad_state
 
+    @torch.compiler.disable
     def untoggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> None:
         """Resets the state of required gradients that were toggled with :meth:`toggle_optimizer`.
 
+        See :meth:`toggle_optimizer` for details on why this method is decorated with
+        :func:`torch.compiler.disable`.
+
         Args:
             optimizer: The optimizer to untoggle.
 
diff --git a/tests/tests_pytorch/core/test_lightning_module.py b/tests/tests_pytorch/core/test_lightning_module.py
@@ -298,6 +298,27 @@ def configure_optimizers(self):
     trainer.fit(model)
 
 
+@RunIf(dynamo=True)
+def test_toggle_untoggle_optimizer_are_compiler_disabled():
+    """Regression test for https://github.com/Lightning-AI/pytorch-lightning/issues/21513.
+
+    ``toggle_optimizer`` / ``untoggle_optimizer`` mutate ``requires_grad`` on Parameters, which
+    Dynamo/AOTAutograd does not support because it can change a tensor's leaf-ness mid-graph.
+    Tracing these helpers either graph-breaks with ``Unsupported: setattr() on Tensor.requires_grad``
+    or raises a ``KeyError`` on the internal ``param_requires_grad_state`` mapping when the traced
+    parameter references diverge from those held by ``trainer.optimizers``. Both methods are
+    decorated with ``@torch.compiler.disable`` so that Dynamo never enters them. This test verifies
+    the decorator is attached via the ``_torchdynamo_disable`` attribute the decorator installs
+    (the same assertion pattern used by ``tests/utilities/test_compile.py::test_compile_uncompile``).
+    """
+
+    def is_compiler_disabled(fn):
+        return any(el.startswith("_torchdynamo_disable") for el in dir(fn))
+
+    assert is_compiler_disabled(LightningModule.toggle_optimizer)
+    assert is_compiler_disabled(LightningModule.untoggle_optimizer)
+
+
 @pytest.mark.parametrize(
     ("accelerator", "device"),
     [