ModelSpeedup fails when setting deterministic algorithms

**Describe the issue**:
`ModelSpeedup` will fail when setting PyTorch deterministic algorithms. Existing unit test doesn't cover this case.


**Environment**:
- NNI version: 2.5
- Training service (local|remote|pai|aml|etc): local
- Client OS: Ubuntu 18.04.6 LTS
- Server OS (for remote mode only): N/A
- Python version: Python 3.7.10
- PyTorch/TensorFlow version: PyTorch 1.10.1
- Is conda/virtualenv/venv used?: Yes
- Is running in Docker?: No


**Configuration**:
 - Experiment config (remember to remove secrets!): N/A
 - Search space: N/A


**Log message**:
 - nnimanager.log: N/A
 - dispatcher.log: N/A
 - nnictl stdout and stderr: N/A
 



**How to reproduce it?**:
This (NNI test case) works for both master branch (`72087f8a178eff6b1890616705f6021cabd8f072`) and v2.5:

```PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; SpeedupTestCase().test_speedup_integration_small()"```

It fails after enabling PyTorch deterministic algorithm:

```PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; import os, torch; os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'; torch.use_deterministic_algorithms(True); SpeedupTestCase().test_speedup_integration_small()"```

The error message is:

```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 365, in test_speedup_integration_small
    self.speedup_integration(model_list)
  File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 426, in speedup_integration
    ms.speedup_model()
  File "/home/yucdai/nni/nni/compression/pytorch/speedup/compressor.py", line 504, in speedup_model
    fix_mask_conflict(self.masks, self.bound_model, self.dummy_input)
  File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 54, in fix_mask_conflict
    masks = fix_channel_mask.fix_mask()
  File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 288, in fix_mask
    new_mask[merged_index, :, :, :] = 1.
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor66151
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ModelSpeedup fails when setting deterministic algorithms #4406

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ModelSpeedup fails when setting deterministic algorithms #4406

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions