ModelSpeedup fails when setting deterministic algorithms #4406
Description
Describe the issue:
ModelSpeedup
will fail when setting PyTorch deterministic algorithms. Existing unit test doesn't cover this case.
Environment:
- NNI version: 2.5
- Training service (local|remote|pai|aml|etc): local
- Client OS: Ubuntu 18.04.6 LTS
- Server OS (for remote mode only): N/A
- Python version: Python 3.7.10
- PyTorch/TensorFlow version: PyTorch 1.10.1
- Is conda/virtualenv/venv used?: Yes
- Is running in Docker?: No
Configuration:
- Experiment config (remember to remove secrets!): N/A
- Search space: N/A
Log message:
- nnimanager.log: N/A
- dispatcher.log: N/A
- nnictl stdout and stderr: N/A
How to reproduce it?:
This (NNI test case) works for both master branch (72087f8a178eff6b1890616705f6021cabd8f072
) and v2.5:
PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; SpeedupTestCase().test_speedup_integration_small()"
It fails after enabling PyTorch deterministic algorithm:
PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; import os, torch; os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'; torch.use_deterministic_algorithms(True); SpeedupTestCase().test_speedup_integration_small()"
The error message is:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 365, in test_speedup_integration_small
self.speedup_integration(model_list)
File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 426, in speedup_integration
ms.speedup_model()
File "/home/yucdai/nni/nni/compression/pytorch/speedup/compressor.py", line 504, in speedup_model
fix_mask_conflict(self.masks, self.bound_model, self.dummy_input)
File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 54, in fix_mask_conflict
masks = fix_channel_mask.fix_mask()
File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 288, in fix_mask
new_mask[merged_index, :, :, :] = 1.
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor66151