CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

When reproducing the results of Score Distillation via Reparametrized DDIM (SDI), I encountered the following error during backpropagation:



[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA GeForce RTX 5090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('me
dium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.h
tml#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
[INFO]
  | Name       | Type                           | Params | Mode
----------------------------------------------------------------------
0 | geometry   | ImplicitVolume                 | 12.6 M | train
1 | material   | DiffuseWithPointLightMaterial  | 0      | train
2 | background | NeuralEnvironmentMapBackground | 448    | train
3 | renderer   | NeRFVolumeRenderer             | 0      | train
----------------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.419    Total estimated model params size (MB)
28        Modules in train mode
0         Modules in eval mode
[INFO] Validation results will be saved to outputs/score-distillation-via-inversion/pumpkin_head_zombie,_skinny,_highly_detailed,_photorealistic@20250908-134636/save
[INFO] Using prompt [pumpkin head zombie, skinny, highly detailed, photorealistic] and negative prompt []
[INFO] Using view-dependent prompts [side]:[pumpkin head zombie, skinny, highly detailed, photorealistic, side view] [front]:[pumpkin head zombie, skinny, highly deta
iled, photorealistic, front view] [back]:[pumpkin head zombie, skinny, highly detailed, photorealistic, back view] [overhead]:[pumpkin head zombie, skinny, highly det
ailed, photorealistic, overhead view]
[INFO] Loading Stable Diffusion ...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  5.50it/s]
[INFO] Loaded Stable Diffusion!
Epoch 0: |                                                                                                                                      | 0/? [00:00<?, ?it/s]
Backward failed!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, alpha_ptr, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, beta_ptr, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F : CUDA_R_16F, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #528

0 | geometry | ImplicitVolume | 12.6 M | train
1 | material | DiffuseWithPointLightMaterial | 0 | train
2 | background | NeuralEnvironmentMapBackground | 448 | train
3 | renderer | NeRFVolumeRenderer | 0 | train

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #528

Description

0 | geometry | ImplicitVolume | 12.6 M | train 1 | material | DiffuseWithPointLightMaterial | 0 | train 2 | background | NeuralEnvironmentMapBackground | 448 | train 3 | renderer | NeRFVolumeRenderer | 0 | train

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

0 | geometry | ImplicitVolume | 12.6 M | train
1 | material | DiffuseWithPointLightMaterial | 0 | train
2 | background | NeuralEnvironmentMapBackground | 448 | train
3 | renderer | NeRFVolumeRenderer | 0 | train