-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Hi @NielsRogge ,
I've been trying to finetune the Deformable DETR models (https://huggingface.co/SenseTime/deformable-detr-with-box-refine-two-stage) for the past few days on a custom object detection dataset using the finetuning DETR notebook you suggested (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) and have swapped out the model name where needed, to the one Deformable DETR model mentioned above , and I have constantly been running into errors, two in particular:
File /libraries/env/lib/python3.11/site-packages/transformers/loss/loss_deformable_detr.py:55, in (.0)
52 cost_matrix = cost_matrix.view(batch_size, num_queries, -1).cpu()
54 sizes = [len(v["boxes"]) for v in targets] ---> 55 indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
56 return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
ValueError: matrix contains invalid numeric entries
(on several forums this was addressed by turning off AMP, which in your example notebook using Trainer, can be done by passing precision = 32)
and when I can get that to work, I am immediately hit by -
"/libraries/env/lib/python3.11/site-packages/transformers/loss/loss_for_object_detection.py", line 418, in generalized_box_iou [rank1]: raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
[rank1]: ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: ...,
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan],
[rank1]: [nan, nan, nan, nan]], device='cuda:1')
Epoch 0: 0%| | 1/4358 [00:03<4:23:53, 0.28it/s, v_num=20, training_loss_step=nan.0]
I for the life of me can't figure out what is going on. I tried the same notebook code with my datasetusing the original DETR model you had listed and it works perfectly well.
For sanity, I went back and tried to run the balloon dataset as-is in your notebook, but with the Deformable DETR model and processor and I run into the same errors, kinda proving that my data wasn't the issue.
Would love to get your help in understanding why your notebook doesn't work with the Deformable DETR checkpoints I linked above, since it worked perfectly well on the DETR one's.
Other Env details:
- GPU: V100
- torch2.6.0+cu126
- transformers 4.57.1
Thanks a lot, for my use case the deformable family of DETRs are best suited, so struggling to try and make it work.