Skip to content
This repository was archived by the owner on Sep 18, 2024. It is now read-only.
This repository was archived by the owner on Sep 18, 2024. It is now read-only.

ModelSpeedup fails when setting deterministic algorithms #4406

Open
@icyblade

Description

@icyblade

Describe the issue:
ModelSpeedup will fail when setting PyTorch deterministic algorithms. Existing unit test doesn't cover this case.

Environment:

  • NNI version: 2.5
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: Ubuntu 18.04.6 LTS
  • Server OS (for remote mode only): N/A
  • Python version: Python 3.7.10
  • PyTorch/TensorFlow version: PyTorch 1.10.1
  • Is conda/virtualenv/venv used?: Yes
  • Is running in Docker?: No

Configuration:

  • Experiment config (remember to remove secrets!): N/A
  • Search space: N/A

Log message:

  • nnimanager.log: N/A
  • dispatcher.log: N/A
  • nnictl stdout and stderr: N/A

How to reproduce it?:
This (NNI test case) works for both master branch (72087f8a178eff6b1890616705f6021cabd8f072) and v2.5:

PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; SpeedupTestCase().test_speedup_integration_small()"

It fails after enabling PyTorch deterministic algorithm:

PYTHONPATH=test python -c "from ut.compression.v1.test_model_speedup import SpeedupTestCase; import os, torch; os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'; torch.use_deterministic_algorithms(True); SpeedupTestCase().test_speedup_integration_small()"

The error message is:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 365, in test_speedup_integration_small
    self.speedup_integration(model_list)
  File "/home/yucdai/nni/test/ut/compression/v1/test_model_speedup.py", line 426, in speedup_integration
    ms.speedup_model()
  File "/home/yucdai/nni/nni/compression/pytorch/speedup/compressor.py", line 504, in speedup_model
    fix_mask_conflict(self.masks, self.bound_model, self.dummy_input)
  File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 54, in fix_mask_conflict
    masks = fix_channel_mask.fix_mask()
  File "/home/yucdai/nni/nni/compression/pytorch/utils/mask_conflict.py", line 288, in fix_mask
    new_mask[merged_index, :, :, :] = 1.
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1639180594101/work/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor66151

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions