Skip to content

CUDA error: misaligned address #52

@BlackTea-c

Description

@BlackTea-c

I use 64*64 in V100 32G, but it's still OOM?

[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ma-user/work//Vista-main/train.py", line 926, in
[rank0]: raise error
[rank0]: File "/home/ma-user/work/Vista-main/train.py", line 906, in
[rank0]: trainer.fit(model, data, ckpt_path=ckpt_resume_path)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 538, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 68, in _call_and_handle_interrupt
[rank0]: trainer._teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1004, in _teardown
[rank0]: self.strategy.teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 419, in teardown
[rank0]: super().teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/parallel.py", line 133, in teardown
[rank0]: super().teardown()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 535, in teardown
[rank0]: self.lightning_module.cpu()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 82, in cpu
[rank0]: return super().cpu()
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in cpu
[rank0]: return self._apply(lambda t: t.cpu())
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 900, in _apply
[rank0]: module._apply(fn)
[rank0]: [Previous line repeated 1 more time]
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in _apply
[rank0]: param_applied = fn(param)
[rank0]: File "/home/ma-user/anaconda3/envs/PyTorch-2.0.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1118, in
[rank0]: return self._apply(lambda t: t.cpu())
[rank0]: RuntimeError: CUDA error: misaligned address
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions