Skip to content

[RTX4090] Bus error(core dumped) when running embedder 4 #19

@daxinY

Description

@daxinY
  1. Specific problem encountered: When executing python3 render.py --config configs/waymo_val_173.yaml mode diffusion command, an error "Bus error(core dumped)" is generated. After import pdb single-step debugging, it is determined that the problem occurs in "def forward(self, batch: Dict, force_zero_embeddings: Optional[List] = None) -> Dict in class GeneralConditioner(nn.Module)", specifically when executing "### for i in range(batch[embedder.input_key].shape[0]): emb_out_1 =embedder(batch[embedder.input_key][i].unsqueeze(0)) emb_out_1s.append(emb_out_1)". When training in embedder 1, 2, and 3, it runs normally. When embedder is 4, the code "for i in range(batch[embedder.input_key].shape[0])" cannot be executed. The specific graphics card configuration is to execute the "render.py" code on a single RTX4090 with 24GB video memory;
  2. When I try to execute this code alone, the same error message is displayed. When executing the 4th embedder, the operation ends directly. What's going on?

Environment information

  • Docker environment: nvidia/cuda:12.1.0-devel-ubuntu20.04
  • Graphics card: NVIDIA RTX 4090 (24GB)
  • Python version: 3.9.5
  • PyTorch version: 2.4.0+cu121

Image
Image
Image
Image
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions