Skip to content

Integration of dinov3, including optional train and inference scripts.#324

Open
S-Mahoney wants to merge 7 commits intoroboflow:developfrom
S-Mahoney:dinov3_integration
Open

Integration of dinov3, including optional train and inference scripts.#324
S-Mahoney wants to merge 7 commits intoroboflow:developfrom
S-Mahoney:dinov3_integration

Conversation

@S-Mahoney
Copy link

@S-Mahoney S-Mahoney commented Aug 17, 2025

Description

Integration of Dinov3 wrapper into RFDETR pipeline.
Included are scripts which allow training of both v2 and v3 using rfdetr, as well as inference test scripts.

Due to the recent release of Dinov3; the weights required can be accessed by setting your HUGGINGFACE_HUB_TOKEN for private access (requires permission) or by requesting access to Dinov3 weights from Meta and cloning the Dinov3 repo locally.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Tested via the training and inference scripts supplied, still allows activation and use of v2 whilst enabling user to also use v3. More training of pre-trained weights required for inference to be on par with dinov2 pretrained rfdetr.

Any specific deployment considerations

Licensing requirements need to be checked by RFDETR owners before deployment of this branch.

I have read the CLA Document and I sign the CLA.

@CLAassistant
Copy link

CLAassistant commented Aug 17, 2025

CLA assistant check
All committers have signed the CLA.

@S-Mahoney
Copy link
Author

I have read the CLA Document and I sign the CLA.

This was referenced Aug 17, 2025
@john09282922
Copy link

changing backbone with DinoV3 is super better for object detection, might be latency issue...

@RoyiAvital
Copy link

I'd add the option to use the ConvNeXt tiny as a backbone.
It is faster to run and comparable performance wise.

Yet I think the main issue is the license of the DINOv3 model.

@Je1zzz
Copy link

Je1zzz commented Dec 13, 2025

I'd add the option to use the ConvNeXt tiny as a backbone. It is faster to run and comparable performance wise.

Yet I think the main issue is the license of the DINOv3 model.

Hey there, I wonder if you just load the weight of dinov3_convext_tiny? or do you load another weight of model?

@Borda Borda added the enhancement New feature or request label Jan 22, 2026
@stedavkle
Copy link

Hi @S-Mahoney, thank you for the work! Could you sync the pull request to include the newest changes from the main repo?
Best regards

@Borda Borda requested review from Copilot and removed request for isaacrob-roboflow February 6, 2026 10:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request integrates DINOv3 (the latest version of Meta's DINO vision transformer) as an alternative backbone encoder into the RF-DETR object detection framework. The integration allows users to choose between DINOv2 and DINOv3 encoders through configuration, with support for loading weights from either HuggingFace Hub or local PyTorch Hub repositories.

Changes:

  • Added DINOv3 wrapper class with flexible weight loading (HuggingFace or local repo)
  • Extended configuration system to support DINOv3 encoder variants (small, base, large) with automatic parameter validation and adjustment
  • Improved training engine with better AMP handling and gradient context management
  • Added example training and inference scripts demonstrating v2/v3 encoder selection

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 31 comments.

Show a summary per file
File Description
rfdetr/models/backbone/dinov3.py New DINOv3 wrapper implementing multi-path forward logic with HF and PyTorch Hub support
rfdetr/models/backbone/backbone.py Extended to branch between dinov2 and dinov3 encoders based on model name prefix
rfdetr/models/backbone/dinov3_configs/*.json Configuration files for dinov3 small/base/large model architectures
rfdetr/config.py Added EncoderName type, DINOv3 config fields, and validators for automatic parameter adjustment
rfdetr/engine.py Updated AMP context manager logic and replaced inference_mode with no_grad for interpolation
rfdetr/train_v2_or_v3.py New training script with encoder aliasing and environment-based configuration
rfdetr/inference_test.py New demo script for testing DINOv2/v3 inference with URL-based image input

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +143
torch.set_grad_enabled(True) # safety
with torch.inference_mode(False):
with autocast(**get_autocast_args(args)):
outputs = model(new_samples, new_targets)
loss_dict = criterion(outputs, new_targets)
weight_dict = criterion.weight_dict
losses = sum(
(1 / args.grad_accum_steps) * loss_dict[k] * weight_dict[k]
for k in loss_dict.keys()
if k in weight_dict
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of the autocast block and its contents appears to be changed. While the torch.inference_mode(False) wrapper was added for safety, the call to torch.set_grad_enabled(True) at line 133 is redundant since torch.inference_mode(False) already ensures gradients are enabled. Consider removing the torch.set_grad_enabled(True) call to simplify the code.

Copilot uses AI. Check for mistakes.
num_select: int = 300
projector_scale: List[Literal["P3", "P4", "P5"]] = ["P4"]
out_feature_indexes: List[int] = [2, 5, 8, 11]
out_feature_indexes: List[int] = [2, 4, 5, 9]
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default out_feature_indexes is changed from [2, 5, 8, 11] to [2, 4, 5, 9], but there's a validator at lines 101-108 that forces it to [8, 11] when using dinov3 encoders. This creates inconsistency and makes it unclear what the actual indexes will be. Consider documenting why these specific indexes were chosen and whether the default should be different for v2 vs v3.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +196
if cand is not None:
if torch.is_tensor(cand) and cand.dim() == 4:
# Already a spatial map [B, C, Hp, Wp]
# Repeat to match requested out_feature_indexes count
C = cand.shape[1]
if C != self.hidden_size:
self.hidden_size = int(C)
self._out_feature_channels = [self.hidden_size] * len(self.out_feature_indexes)
return [cand for _ in self.out_feature_indexes]
# Otherwise assume tokens
tokens = cand
# If [HW, C] or [B*HW, C], _tokens_to_map will handle reshape
feats = [self._tokens_to_map(tokens, B, H, W) for _ in self.out_feature_indexes]
return feats
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the forward_features fallback path, when no suitable candidate is found in the dictionary (line 179), the code falls through without raising an error. This could lead to unexpected behavior. Consider raising a more informative error if cand remains None after attempting to extract features from the dictionary.

Copilot uses AI. Check for mistakes.

# auto-fit out_feature_indexes to avoid projector shape mismatches
@field_validator("out_feature_indexes", mode="after")
def _coerce_out_feats_for_backbone(cls, v, info: ValidationInfo):
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal methods should have 'self', rather than 'cls', as their first parameter.

Copilot uses AI. Check for mistakes.
Remove leftover ChatGPT appendix

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@S-Mahoney S-Mahoney requested a review from isaacrob as a code owner February 6, 2026 10:36
S-Mahoney and others added 5 commits February 6, 2026 10:37
Removal of unused import 'Field'

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Removal of unused import 'List'

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Removal of unused import 'platform'

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Removal of unused import 'nullcontext'

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Removing commented code clutter

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Borda Borda force-pushed the develop branch 4 times, most recently from 60b16c1 to 523f9df Compare February 14, 2026 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request has conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments