Skip to content

Deformable DETR Finetuning Errors #506

@iamsashank09

Description

@iamsashank09

Hi @NielsRogge ,

I've been trying to finetune the Deformable DETR models (https://huggingface.co/SenseTime/deformable-detr-with-box-refine-two-stage) for the past few days on a custom object detection dataset using the finetuning DETR notebook you suggested (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) and have swapped out the model name where needed, to the one Deformable DETR model mentioned above , and I have constantly been running into errors, two in particular:

File /libraries/env/lib/python3.11/site-packages/transformers/loss/loss_deformable_detr.py:55, in (.0)
52 cost_matrix = cost_matrix.view(batch_size, num_queries, -1).cpu()
54 sizes = [len(v["boxes"]) for v in targets] ---> 55 indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
56 return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]

ValueError: matrix contains invalid numeric entries

(on several forums this was addressed by turning off AMP, which in your example notebook using Trainer, can be done by passing precision = 32)

and when I can get that to work, I am immediately hit by -

"/libraries/env/lib/python3.11/site-packages/transformers/loss/loss_for_object_detection.py", line 418, in generalized_box_iou [rank1]: raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
[rank1]: ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: ...,
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan]], device='cuda:1')
Epoch 0: 0%| | 1/4358 [00:03<4:23:53, 0.28it/s, v_num=20, training_loss_step=nan.0]

I for the life of me can't figure out what is going on. I tried the same notebook code with my datasetusing the original DETR model you had listed and it works perfectly well.

For sanity, I went back and tried to run the balloon dataset as-is in your notebook, but with the Deformable DETR model and processor and I run into the same errors, kinda proving that my data wasn't the issue.

Would love to get your help in understanding why your notebook doesn't work with the Deformable DETR checkpoints I linked above, since it worked perfectly well on the DETR one's.

Other Env details:

  • GPU: V100
  • torch2.6.0+cu126
  • transformers 4.57.1

Thanks a lot, for my use case the deformable family of DETRs are best suited, so struggling to try and make it work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions