Skip to content

Simple Question about object detection code #11

@AlexCo1d

Description

@AlexCo1d

def forward(self, video_o):
bsize, numc, numf, numr, fdim = video_o.shape
video_o = video_o.view(bsize, numc*numf, numr, fdim)
roi_feat = video_o[:,:,:, :self.dim_feat]
roi_bbox = video_o[:,:,:, self.dim_feat:(self.dim_feat+self.dim_bbox)]
bbox_pos = self.bbox_conv(roi_bbox.permute(
0, 3, 1, 2)).permute(0, 2, 3, 1)
bbox_features = torch.cat([roi_feat, bbox_pos], dim=-1)
bbox_feat = self.tohid(bbox_features)
return bbox_feat

Hi, I am reading your code about object detection. I found the above one in your EncoderVid.py
Do you still remember why you choose 5 dimension (dim_bbox) for positional embedding? What is the source of this way? (Faster RCNN or Detectron)

Thank you for your prompt response! Thanks for your great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions