-
-
Notifications
You must be signed in to change notification settings - Fork 59
Description
I use 64*64 in V100 32G, but it's still OOM?
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ma-user/work//Vista-main/train.py", line 926, in
[rank0]: raise error
[rank0]: File "/home/ma-user/work/Vista-main/train.py", line 906, in
[rank0]: trainer.fit(model, data, ckpt_path=ckpt_resume_path)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 538, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 68, in _call_and_handle_interrupt
[rank0]: trainer._teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1004, in _teardown
[rank0]: self.strategy.teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 419, in teardown
[rank0]: super().teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/parallel.py", line 133, in teardown
[rank0]: super().teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 535, in teardown
[rank0]: self.lightning_module.cpu()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 82, in cpu
[rank0]: return super().cpu()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in cpu
[rank0]: return self._apply(lambda t: t.cpu())
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: [Previous line repeated 1 more time]
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in _apply
[rank0]: param_applied = fn(param)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in
[rank0]: return self._apply(lambda t: t.cpu())
[rank0]: RuntimeError: CUDA error: misaligned address
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.