Skip to content

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #528

@lyxiang-casia

Description

@lyxiang-casia

When reproducing the results of Score Distillation via Reparametrized DDIM (SDI), I encountered the following error during backpropagation:

[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA GeForce RTX 5090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('me dium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.h
tml#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
[INFO]
| Name | Type | Params | Mode

0 | geometry | ImplicitVolume | 12.6 M | train
1 | material | DiffuseWithPointLightMaterial | 0 | train
2 | background | NeuralEnvironmentMapBackground | 448 | train
3 | renderer | NeRFVolumeRenderer | 0 | train

12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
50.419 Total estimated model params size (MB)
28 Modules in train mode
0 Modules in eval mode
[INFO] Validation results will be saved to outputs/score-distillation-via-inversion/pumpkin_head_zombie,_skinny,_highly_detailed,_photorealistic@20250908-134636/save
[INFO] Using prompt [pumpkin head zombie, skinny, highly detailed, photorealistic] and negative prompt []
[INFO] Using view-dependent prompts [side]:[pumpkin head zombie, skinny, highly detailed, photorealistic, side view] [front]:[pumpkin head zombie, skinny, highly deta
iled, photorealistic, front view] [back]:[pumpkin head zombie, skinny, highly detailed, photorealistic, back view] [overhead]:[pumpkin head zombie, skinny, highly det
ailed, photorealistic, overhead view]
[INFO] Loading Stable Diffusion ...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.50it/s]
[INFO] Loaded Stable Diffusion!
Epoch 0: | | 0/? [00:00<?, ?it/s]
Backward failed!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, alpha_ptr, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, beta_ptr, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F : CUDA_R_16F, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions