Description
🐛 Bug
When I attempt to use the pruning callback (ModelPruning('l1_unstructured', amount=0.5)
) in conjunction with profiler='pytorch'
I get the following error:
zz2hyoqiv6-algo-1-vqegg | Traceback (most recent call last):
zz2hyoqiv6-algo-1-vqegg | File "train.py", line 51, in
zz2hyoqiv6-algo-1-vqegg | main(args)
zz2hyoqiv6-algo-1-vqegg | File "train.py", line 44, in main
zz2hyoqiv6-algo-1-vqegg | else: trainer.fit(model, dm)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
zz2hyoqiv6-algo-1-vqegg | self._call_and_handle_interrupt(
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
zz2hyoqiv6-algo-1-vqegg | return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
zz2hyoqiv6-algo-1-vqegg | return function(*args, **kwargs)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
zz2hyoqiv6-algo-1-vqegg | results = self._run(model, ckpt_path=self.ckpt_path)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1174, in _run
zz2hyoqiv6-algo-1-vqegg | self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1493, in _call_setup_hook
zz2hyoqiv6-algo-1-vqegg | self._call_callback_hooks("setup", stage=fn)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks
zz2hyoqiv6-algo-1-vqegg | fn(self, self.lightning_module, *args, **kwargs)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/callbacks/pruning.py", line 378, in setup
zz2hyoqiv6-algo-1-vqegg | self.original_layers.setdefault(id, _LayerRef(data=deepcopy(module), names=[]))
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 296, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | value = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 264, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | y = func(*args)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 263, in
zz2hyoqiv6-algo-1-vqegg | args = (deepcopy(arg, memo) for arg in args)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 237, in _deepcopy_method
zz2hyoqiv6-algo-1-vqegg | return type(x)(x.func, deepcopy(x.self, memo))
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 296, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | value = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 210, in _deepcopy_tuple
zz2hyoqiv6-algo-1-vqegg | y = [deepcopy(a, memo) for a in x]
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 210, in
zz2hyoqiv6-algo-1-vqegg | y = [deepcopy(a, memo) for a in x]
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 210, in _deepcopy_tuple
zz2hyoqiv6-algo-1-vqegg | y = [deepcopy(a, memo) for a in x]
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 210, in
zz2hyoqiv6-algo-1-vqegg | y = [deepcopy(a, memo) for a in x]
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 205, in _deepcopy_list
zz2hyoqiv6-algo-1-vqegg | append(deepcopy(a, memo))
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 172, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = _reconstruct(x, memo, *rv)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 270, in _reconstruct
zz2hyoqiv6-algo-1-vqegg | state = deepcopy(state, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 146, in deepcopy
zz2hyoqiv6-algo-1-vqegg | y = copier(x, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 230, in _deepcopy_dict
zz2hyoqiv6-algo-1-vqegg | y[deepcopy(key, memo)] = deepcopy(value, memo)
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/copy.py", line 161, in deepcopy
zz2hyoqiv6-algo-1-vqegg | rv = reductor(4)
zz2hyoqiv6-algo-1-vqegg | TypeError: cannot pickle '_io.TextIOWrapper' object
zz2hyoqiv6-algo-1-vqegg | Exception ignored in: <function BaseProfiler.del at 0x7fd53b5c7940>
zz2hyoqiv6-algo-1-vqegg | Traceback (most recent call last):
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/base.py", line 199, in del
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 509, in teardown
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 494, in _delete_profilers
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 489, in _cache_functions_events
zz2hyoqiv6-algo-1-vqegg | File "/opt/conda/lib/python3.8/site-packages/torch/profiler/profiler.py", line 382, in events
zz2hyoqiv6-algo-1-vqegg | AssertionError:
zz2hyoqiv6-algo-1-vqegg | 2022-03-21 12:47:50,301 sagemaker-training-toolkit ERROR Reporting training FAILURE
zz2hyoqiv6-algo-1-vqegg | 2022-03-21 12:47:50,301 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
zz2hyoqiv6-algo-1-vqegg | ExitCode 1
zz2hyoqiv6-algo-1-vqegg | ErrorMessage "TypeError: cannot pickle '_io.TextIOWrapper' object
zz2hyoqiv6-algo-1-vqegg | Exception ignored in: <function BaseProfiler.del at 0x7fd53b5c7940> Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/base.py", line 199, in del File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 509, in teardown File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 494, in _delete_profilers File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/profiler/pytorch.py", line 489, in _cache_functions_events File "/opt/conda/lib/python3.8/site-packages/torch/profiler/profiler.py", line 382, in events AssertionError:"
When I comment out profiler='pytorch'
the fit call runs fine. I think this may be related to pytorch/pytorch#37322
If it's not fixable, maybe a warning and disabling of one or the other?
To Reproduce
I have not yet been able to successfully reproduce with the BoringModel. If I manage to I will update this issue.
Expected behavior
No error.
Environment
You can also fill out the list below manually.
-->
- PyTorch Lightning Version (e.g., 1.5.0): mainline
- PyTorch Version (e.g., 1.10): 1.10
- Python version (e.g., 3.9): 3.8
- OS (e.g., Linux): Ubuntu
- CUDA/cuDNN version: 11.3
- GPU models and configuration: V100
- How you installed PyTorch (
conda
,pip
, source): pip git mainline
cc @carmocca @kaushikb11 @ninginthecloud @rohitgr7 @nbcsm @guotuofeng