-
Notifications
You must be signed in to change notification settings - Fork 414
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I have searched the RF-DETR issues and found no similar bug report.
Bug
I am finetuning RF-DETR with custom dataset transforms since I have some very specific requirements regarding to image sizes. I run into the following error:
│ │
│ 312 │ │ │ num_w_patches_per_window = num_w_patches // self.config.num_windows │
│ 313 │ │ │ num_h_patches_per_window = num_h_patches // self.config.num_windows │
│ 314 │ │ │ num_windows = self.config.num_windows │
│ ❱ 315 │ │ │ windowed_pixel_tokens = pixel_tokens_with_pos_embed.reshape(batch_size * num │
│ 316 │ │ │ windowed_pixel_tokens = windowed_pixel_tokens.permute(0, 2, 1, 3, 4) │
│ 317 │ │ │ windowed_pixel_tokens = windowed_pixel_tokens.reshape(batch_size * num_windo │
│ 318 │ │ │ windowed_cls_token_with_pos_embed = cls_token_with_pos_embed.repeat(num_wind │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ _ = 3 │ │
│ │ batch_size = 4 │ │
│ │ bool_masked_pos = None │ │
│ │ cls_token_with_pos_embed = <Tensor shape=(4, 1, 384), dtype=torch.float32, device=cuda:0> │ │
│ │ cls_tokens = <Tensor shape=(4, 1, 384), dtype=torch.float32, device=cuda:0> │ │
│ │ embeddings = <Tensor shape=(4, 4161, 384), dtype=torch.float32, │ │
│ │ device=cuda:0> │ │
│ │ height = 832 │ │
│ │ num_h_patches = 52 │ │
│ │ num_h_patches_per_window = 26 │ │
│ │ num_w_patches = 80 │ │
│ │ num_w_patches_per_window = 40 │ │
│ │ num_windows = 2 │ │
│ │ pixel_tokens_with_pos_embed = <Tensor shape=(4, 52, 80, 384), dtype=torch.float32, │ │
│ │ device=cuda:0> │ │
│ │ pixel_values = <Tensor shape=(4, 3, 832, 1280), dtype=torch.float32, │ │
│ │ device=cuda:0> │ │
│ │ self = WindowedDinov2WithRegistersEmbeddings( │ │
│ │ (patch_embeddings): Dinov2WithRegistersPatchEmbeddings( │ │
│ │ │ (projection): Conv2d(3, 384, kernel_size=(16, 16), │ │
│ │ stride=(16, 16)) │ │
│ │ ) │ │
│ │ (dropout): Dropout(p=0.0, inplace=False) │ │
│ │ ) │ │
│ │ target_dtype = torch.float32 │ │
│ │ width = 1280 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[8, 26, 2, 26, -1]' is invalid for input of size 6389760
I am not 100% sure this is actually a bug since my input sizes might be incorrect. I don't think so though since if line 315 would be using both num_h_patches_per_window
and num_w_patches_per_window
, instead of num_h_patches_per_window
twice this reshaping would actually be successful.
To sum up, my suspicion is line 315 should be
windowed_pixel_tokens = pixel_tokens_with_pos_embed.reshape(batch_size * num_windows, num_h_patches_per_window, num_windows, num_w_patches_per_window, -1)
This effectively fixes the issue AFAIK
Let me know if I'm wrong!
Environment
- RF-DETR: 1.3.0
- Ubuntu 24.04.3 LTS
- RTX 3090
- CUDA Version: 12.9
- Python 3.12.7
Minimal Reproducible Example
import torch
from rfdetr import RFDETRNano
model = RFDETRNano().model.model.to("cpu")
ins = torch.randn(4, 3, 832, 1280)
outs = model(ins)
Additional
No response
Are you willing to submit a PR?
- Yes, I'd like to help by submitting a PR!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working