Skip to content

Run Inference stage with command line on Ubuntu22.04, RuntimeError: CUDA error fixed! #20

@j-yi-11

Description

@j-yi-11

My Modification for Command Line Usage:

Smooth Diffusion is a Great job! I try to run with command line to reproduce the results. To acheive this, I mainly removed the part of gradio and set argparse in original app.py, with detail in file app.txt

Bug output

When I run with python app.py --mode interpolation --img0 './assets/images/interpolation/cityview1.png' --img1 './assets/images/interpolation/cityview2.png' --txt0 'A city view' --txt1 'Another city view' on Ubuntu22.04 and find something wrong as follows:

│    596 │   │   else:                                                                             │
│    597 │   │   │   print("self.tag_diffuser = ", self.tag_diffuser)                              │
│    598 │   │   │   print("self.tag_lora = ", self.tag_lora)                                      │
│ ❱  599 │   │   │   data0, data1 = self.nullinvdual_or_loadcachedual(                             │
│    600 │   │   │   │   img0, img1, {'txt0':txt0, 'txt1':txt1, 'step':step,                       │
│    601 │   │   │   │   │   │   │    'cfg_scale':cfg_scale, 'inner_step':inner_step,              │
│    602 │   │   │   │   │   │   │    'diffuser' : self.tag_diffuser, 'lora' : self.tag_lora,}, f  │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/app.py:400 in nullinvdual_or_loadcachedual                  │
│                                                                                                  │
│    397 │   │   │   │   emb0 = txt_to_emb(self.net, txt0)                                         │
│    398 │   │   │   │   emb1 = txt_to_emb(self.net, txt1)                                         │
│    399 │   │   │                                                                                 │
│ ❱  400 │   │   │   xt0, xt1, nemb = null_inversion_model.null_invert_dual(                       │
│    401 │   │   │   │   img0, img1, txt0, txt1, num_inner_steps=inner_step)                       │
│    402 │   │   │   cache_data = {                                                                │
│    403 │   │   │   │   'step' : step, 'cfg_scale' : cfg_scale,                                   │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/nulltxtinv_wrapper.py:460 in null_invert_dual               │
│                                                                                                  │
│   457 │   │   nemb = nemb.to(device)                                                             │
│   458 │   │                                                                                      │
│   459 │   │   # nulltext inversion                                                               │
│ ❱ 460 │   │   nembs = self.null_optimization_dual(                                               │
│   461 │   │   │   ddim_latents_0, ddim_latents_1, emb0, emb1, nemb, num_inner_steps, early_sto   │
│   462 │   │                                                                                      │
│   463 │   │   self.model.scheduler = scheduler_save                                              │
│                                                                                                  │
│ /home/oppo2/jy/smooth-Diffusion-main/nulltxtinv_wrapper.py:407 in null_optimization_dual         │
│                                                                                                  │
│   404 │   │   │   │   │      nnf.mse_loss(latents_prev_rec1, latent_prev1)                       │
│   405 │   │   │   │                                                                              │
│   406 │   │   │   │   optimizer.zero_grad()                                                      │
│ ❱ 407 │   │   │   │   loss.backward()                                                            │
│   408 │   │   │   │   optimizer.step()                                                           │
│   409 │   │   │   │   loss_item = loss.item()                                                    │
│   410 │   │   │   │   bar.update()                                                               │
│                                                                                                  │
│ /home/oppo2/anaconda3/envs/smooth-diffusion/lib/python3.9/site-packages/torch/_tensor.py:487 in  │
│ backward                                                                                         │
│                                                                                                  │
│    484 │   │   │   │   create_graph=create_graph,                                                │
│    485 │   │   │   │   inputs=inputs,                                                            │
│    486 │   │   │   )                                                                             │
│ ❱  487 │   │   torch.autograd.backward(                                                          │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs                     │
│    489 │   │   )                                                                                 │
│    490                                                                                           │
│                                                                                                  │
│ /home/oppo2/anaconda3/envs/smooth-diffusion/lib/python3.9/site-packages/torch/autograd/__init__. │
│ py:200 in backward                                                                               │
│                                                                                                  │
│   197 │   # The reason we repeat same the comment below is that                                  │
│   198 │   # some Python versions print out the first line of a multi-line function               │
│   199 │   # calls in the traceback and some print out the last line                              │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the bac   │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,                        │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to ru   │
│   203                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid argument
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

My Solution

In nulltxtinv_wrapper.py, I added:

                latent_prev = latent_prev.detach().clone()
                latent_prev.requires_grad = True
                latents_prev_rec = latents_prev_rec.detach().clone()
                latents_prev_rec.requires_grad = True

in function null_optimization, I added:

                latents_prev_rec0 = latents_prev_rec0.detach().clone()
                latents_prev_rec0.requires_grad = True
                latents_prev_rec1 = latents_prev_rec1.detach().clone()
                latents_prev_rec1.requires_grad = True
                latent_prev0 = latent_prev0.detach().clone()
                latent_prev0.requires_grad = True
                latent_prev1 = latent_prev1.detach().clone()
                latent_prev1.requires_grad = True

in function null_optimization_dual, which are detailed in file nulltxtinv_wrapper.txt
These modifications makes running with cmd successful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions