This repository was archived by the owner on Sep 18, 2024. It is now read-only.
This repository was archived by the owner on Sep 18, 2024. It is now read-only.
Error when running Resnet18 with Slim Pruner #3947
Open
Description
Describe the issue:
SlimPruner runs into an error when specifying a small sparsity level. I am training ResNet18 with imagenet dataset.
Config: [{'sparsity': 3.9214609171977314e-06, 'op_types': ['BatchNorm2d']}]
I am using sparsifying_training_epochs=3
Please let me know if you need more details.
Error log:
Traceback (most recent call last):
....
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/algorithms/compression/pytorch/pruning/iterative_pruner.py", line 89, in compress
self.update_mask()
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/algorithms/compression/pytorch/pruning/dependency_aware_pruner.py", line 78, in update_mask
super(DependencyAwarePruner, self).update_mask()
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/compression/pytorch/compressor.py", line 339, in update_mask
masks = self.calc_mask(wrapper, wrapper_idx=wrapper_idx)
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/algorithms/compression/pytorch/pruning/dependency_aware_pruner.py", line 65, in calc_mask
sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/algorithms/compression/pytorch/pruning/structured_pruning_masker.py", line 702, in calc_mask
self._get_global_threshold()
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/nni/algorithms/compression/pytorch/pruning/structured_pruning_masker.py", line 695, in _get_global_threshold
all_bn_weights.view(-1), k, largest=False)[0].max()
RuntimeError: operation does not have an identity.
/pytorch/aten/src/THC/THCTensorTopK.cuh:107: gatherTopK: block: [0,0,0], thread: [992,0,0] Assertion `writeIndex < outputSliceSize` failed.
/pytorch/aten/src/THC/THCTensorTopK.cuh:107: gatherTopK: block: [0,0,0], thread: [993,0,0] Assertion `writeIndex < outputSliceSize` failed.
....
Environment:
- NNI version: 2.3
- Training service (local|remote|pai|aml|etc): local
- Client OS: ubuntu 18.04
- Python version: 3.7
- PyTorch/TensorFlow version: PyTorch 1.81
- Is conda/virtualenv/venv used?: conda
- Is running in Docker?: no
How to reproduce it?:
Running SlimPruner with above config using Resnet18 model with Imagenet