Skip to content

Commit a6f5739

Browse files
authored
Merge branch 'master' into fix/ddp-accumulate-grad-stream-mismatch-warning
2 parents 12b4d26 + 1120456 commit a6f5739

12 files changed

Lines changed: 78 additions & 13 deletions

File tree

.github/workflows/_legacy-checkpoints.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ jobs:
6060
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
6161

6262
- name: Install uv and set Python version
63-
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
63+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
6464
with:
6565
python-version: "3.10"
6666
# TODO: Avoid activating environment like this

.github/workflows/ci-tests-fabric.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
7575

7676
- name: Install uv and set Python version
77-
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
77+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
7878
with:
7979
python-version: ${{ matrix.config.python-version || '3.10' }}
8080
# TODO: Avoid activating environment like this

.github/workflows/ci-tests-pytorch.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ jobs:
7979
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
8080

8181
- name: Install uv and set Python version
82-
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
82+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
8383
with:
8484
python-version: ${{ matrix.config.python-version || '3.10' }}
8585
# TODO: Avoid activating environment like this

.github/workflows/code-checks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
3535

3636
- name: Install uv and set Python version
37-
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
37+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
3838
with:
3939
python-version: "3.11"
4040
# TODO: Avoid activating environment like this

.github/workflows/docs-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
lfs: ${{ matrix.pkg-name == 'pytorch' }}
7575

7676
- name: Install uv and set Python version
77-
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7.6.0
77+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
7878
with:
7979
python-version: "3.10"
8080
# TODO: Avoid activating environment like this

.github/workflows/release-pkg.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ jobs:
154154
155155
- name: Publish distribution 📦 to PyPI
156156
# pypa/gh-action-pypi-publish v1.13.0
157-
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e
157+
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b
158158
with:
159159
packages_dir: dist/${{ steps.folder.outputs.pkg }}
160160
verbose: true

src/lightning/pytorch/CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
2424

2525
### Fixed
2626

27+
- Fixed non-zero process exits in `CombinedLoader.reset()` with large tensors and persistent spawned workers by avoiding explicit `_shutdown_workers()` calls and relying on iterator cleanup via `del` [#21708](https://github.com/Lightning-AI/pytorch-lightning/issues/21708)
28+
2729
- Fixed `SIGTERMException` producing a zero exit code instead of 143 (128 + SIGTERM) ([#21623](https://github.com/Lightning-AI/pytorch-lightning/issues/21623))
2830

2931
- fixed AccumulateGrad stream mismatch warning when using DDP with Trainer ([#21746](https://github.com/Lightning-AI/pytorch-lightning/pull/21746))
3032

33+
- Fixed `LightningModule.toggle_optimizer` / `untoggle_optimizer` breaking under `torch.compile` by disabling Dynamo tracing on these bookkeeping helpers ([#21513](https://github.com/Lightning-AI/pytorch-lightning/issues/21513))
34+
35+
3136
---
3237

3338
## [2.6.4] - 2026-05-20

src/lightning/pytorch/core/module.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1136,12 +1136,24 @@ def backward(self, loss):
11361136
else:
11371137
loss.backward(*args, **kwargs)
11381138

1139+
@torch.compiler.disable
11391140
def toggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> None:
11401141
"""Makes sure only the gradients of the current optimizer's parameters are calculated in the training step to
11411142
prevent dangling gradients in multiple-optimizer setup.
11421143
11431144
It works with :meth:`untoggle_optimizer` to make sure ``param_requires_grad_state`` is properly reset.
11441145
1146+
.. note::
1147+
This method is decorated with :func:`torch.compiler.disable` so that it is executed as regular
1148+
Python when the ``LightningModule`` is wrapped with :func:`torch.compile`. Mutating
1149+
``requires_grad`` on parameters is not supported by Dynamo/AOTAutograd (it can change a
1150+
tensor's leaf-ness mid-graph), so tracing this bookkeeping helper would either fail with
1151+
``Unsupported: setattr() on Tensor.requires_grad`` or produce a ``KeyError`` on the
1152+
internal ``param_requires_grad_state`` mapping when the traced parameter references diverge
1153+
from those held by ``trainer.optimizers``. Disabling the compiler on this method keeps the
1154+
behavior identical for eager users while making it safe to call from a compiled
1155+
``training_step``.
1156+
11451157
Args:
11461158
optimizer: The optimizer to toggle.
11471159
@@ -1165,9 +1177,13 @@ def toggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> N
11651177
param.requires_grad = param_requires_grad_state[param]
11661178
self._param_requires_grad_state = param_requires_grad_state
11671179

1180+
@torch.compiler.disable
11681181
def untoggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> None:
11691182
"""Resets the state of required gradients that were toggled with :meth:`toggle_optimizer`.
11701183
1184+
See :meth:`toggle_optimizer` for details on why this method is decorated with
1185+
:func:`torch.compiler.disable`.
1186+
11711187
Args:
11721188
optimizer: The optimizer to untoggle.
11731189

src/lightning/pytorch/utilities/combined_loader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -397,7 +397,7 @@ def _load_state_dicts(self, states: list[dict[str, Any]]) -> None:
397397
def _shutdown_workers_and_reset_iterator(dataloader: object) -> None:
398398
if hasattr(dataloader, "_iterator"):
399399
if isinstance(dataloader._iterator, _MultiProcessingDataLoaderIter):
400-
dataloader._iterator._shutdown_workers()
400+
del dataloader._iterator
401401
dataloader._iterator = None
402402

403403

tests/tests_pytorch/core/test_lightning_module.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,27 @@ def configure_optimizers(self):
298298
trainer.fit(model)
299299

300300

301+
@RunIf(dynamo=True)
302+
def test_toggle_untoggle_optimizer_are_compiler_disabled():
303+
"""Regression test for https://github.com/Lightning-AI/pytorch-lightning/issues/21513.
304+
305+
``toggle_optimizer`` / ``untoggle_optimizer`` mutate ``requires_grad`` on Parameters, which
306+
Dynamo/AOTAutograd does not support because it can change a tensor's leaf-ness mid-graph.
307+
Tracing these helpers either graph-breaks with ``Unsupported: setattr() on Tensor.requires_grad``
308+
or raises a ``KeyError`` on the internal ``param_requires_grad_state`` mapping when the traced
309+
parameter references diverge from those held by ``trainer.optimizers``. Both methods are
310+
decorated with ``@torch.compiler.disable`` so that Dynamo never enters them. This test verifies
311+
the decorator is attached via the ``_torchdynamo_disable`` attribute the decorator installs
312+
(the same assertion pattern used by ``tests/utilities/test_compile.py::test_compile_uncompile``).
313+
"""
314+
315+
def is_compiler_disabled(fn):
316+
return any(el.startswith("_torchdynamo_disable") for el in dir(fn))
317+
318+
assert is_compiler_disabled(LightningModule.toggle_optimizer)
319+
assert is_compiler_disabled(LightningModule.untoggle_optimizer)
320+
321+
301322
@pytest.mark.parametrize(
302323
("accelerator", "device"),
303324
[

0 commit comments

Comments
 (0)