Skip to content

Commit 91ffb68

Browse files
[docs] Updated README (#315)
* [docs] updated the README with convnext * adding linear probing dense evals
1 parent 54694f7 commit 91ffb68

File tree

1 file changed

+28
-23
lines changed

1 file changed

+28
-23
lines changed

README.md

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
🆕 [2025-09-17] :fire: DINOv3 backbones are now supported by the [PyTorch Image Models / timm](https://github.com/huggingface/pytorch-image-models/) library starting with version [1.0.20](https://github.com/huggingface/pytorch-image-models/releases/tag/v1.0.20)
1+
🆕 [2025-11-20] Distillation code and configurations for ConvNeXt backbones are now released!
2+
3+
🆕 [2025-10-13] [Semantic segmentation](https://github.com/facebookresearch/dinov3?tab=readme-ov-file#linear-segmentation-with-data-augmentation-on-ade20k) (ADE20K) and [monocular depth estimation](https://github.com/facebookresearch/dinov3?tab=readme-ov-file#linear-depth-estimation-on-nyuv2-depth) (NYUv2-Depth) linear probing code are now released!
4+
5+
[2025-09-17] DINOv3 backbones are now supported by the [PyTorch Image Models / timm](https://github.com/huggingface/pytorch-image-models/) library starting with version [1.0.20](https://github.com/huggingface/pytorch-image-models/releases/tag/v1.0.20)
26

37
[2025-08-29] DINOv3 backbones are [supported](https://huggingface.co/docs/transformers/model_doc/dinov3) by released versions of the Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) library starting with version [4.56.0](https://github.com/huggingface/transformers/releases/tag/v4.56.0)
48

@@ -197,7 +201,7 @@ image = load_image(url)
197201

198202
feature_extractor = pipeline(
199203
model="facebook/dinov3-convnext-tiny-pretrain-lvd1689m",
200-
task="image-feature-extraction",
204+
task="image-feature-extraction",
201205
)
202206
features = feature_extractor(image)
203207
```
@@ -213,8 +217,8 @@ image = load_image(url)
213217
pretrained_model_name = "facebook/dinov3-convnext-tiny-pretrain-lvd1689m"
214218
processor = AutoImageProcessor.from_pretrained(pretrained_model_name)
215219
model = AutoModel.from_pretrained(
216-
pretrained_model_name,
217-
device_map="auto",
220+
pretrained_model_name,
221+
device_map="auto",
218222
)
219223

220224
inputs = processor(images=image, return_tensors="pt").to(model.device)
@@ -409,20 +413,6 @@ output_dir=<PATH/TO/OUTPUT/DIR>
409413
- One can also save prediction results using `result_config.save_results=true`.
410414

411415

412-
#### Linear depth estimation on NYUv2 Depth
413-
```shell
414-
PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/depth/run.py \
415-
model.dino_hub=dinov3_vit7b16 \
416-
config=dinov3/eval/depth/configs/config-nyu.yaml \
417-
datasets.root=<PATH/TO/DATASET> \
418-
--output-dir <PATH/TO/OUTPUT/DIR>
419-
```
420-
421-
After the job completes, you will find in the output path directory you specified
422-
- `depth_config.yaml` that contains the config you trained the model with;
423-
- `model_final.pth`, the final linear head checkpoint at the end of training; and
424-
- `results-depth.csv` with the final metrics.
425-
426416
### Pretrained heads - Detector trained on COCO2017 dataset
427417

428418
<table style="margin: auto">
@@ -523,7 +513,7 @@ transform = make_transform(img_size)
523513
with torch.inference_mode():
524514
with torch.autocast('cuda', dtype=torch.bfloat16):
525515
batch_img = transform(img)[None]
526-
pred_vit7b = segmentor(batch_img) # raw predictions
516+
pred_vit7b = segmentor(batch_img) # raw predictions
527517
# actual segmentation map
528518
segmentation_map_vit7b = make_inference(
529519
batch_img,
@@ -689,7 +679,7 @@ PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
689679
--config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \
690680
--output-dir <PATH/TO/OUTPUT/DIR> \
691681
train.dataset_path=<DATASET>:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
692-
gram.ckpt=<PATH/TO/GRAM_TEACHER_FROM_PREVIOUS_STEP>
682+
gram.ckpt=<PATH/TO/GRAM_TEACHER_FROM_PREVIOUS_STEP>
693683
```
694684

695685
#### High-resolution adaptation
@@ -705,7 +695,7 @@ PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
705695
student.resume_from_teacher_chkpt=<PATH/TO/TEACHER_FROM_GRAM>
706696
```
707697

708-
## Multi-distillation
698+
## Multi-distillation
709699

710700
### Test setup:
711701

@@ -771,20 +761,35 @@ After the job completes, you will find in the output path directory you specifie
771761
- `model_final.pth`, the final linear head checkpoint at the end of training; and
772762
- `results-semantic-segmentation.csv` with the final metrics.
773763

764+
765+
#### Linear depth estimation on NYUv2 Depth
766+
```shell
767+
PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/depth/run.py \
768+
model.dino_hub=dinov3_vit7b16 \
769+
config=dinov3/eval/depth/configs/config-nyu.yaml \
770+
datasets.root=<PATH/TO/DATASET> \
771+
--output-dir <PATH/TO/OUTPUT/DIR>
772+
```
773+
774+
After the job completes, you will find in the output path directory you specified
775+
- `depth_config.yaml` that contains the config you trained the model with;
776+
- `model_final.pth`, the final linear head checkpoint at the end of training; and
777+
- `results-depth.csv` with the final metrics.
778+
774779
### Text alignment on DINOv3 using dino.txt
775780

776781
Text alignment can be done following the method from `dino.txt` aka [DINOv2 Meets Text](https://arxiv.org/abs/2412.16334).
777782

778783
```shell
779784
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/text/train_dinotxt.py \
780785
--nodes 4 \
781-
# An example config for text alignment is here: dinov3/eval/text/configs/dinov3_vitl_text.yaml \
786+
# An example config for text alignment is here: dinov3/eval/text/configs/dinov3_vitl_text.yaml \
782787
trainer_config_file="<PATH/TO/DINOv3/TEXT/CONFIG>" \
783788
output-dir=<PATH/TO/OUTPUT/DIR>
784789
```
785790
Launching the above trains text alignment on 4 nodes with 8 gpus each (32 gpus in total).
786791
Please note that the text alignment model in the DINOv3 paper was trained on a private dataset and here we have given an example config in ```dinov3/eval/text/configs/dinov3_vitl_text.yaml``` using ```CocoCaptions``` dataset for illustration purposes.
787-
Please adapt the provided ```CocoCaptions``` dataset class, the dataset can be found [here](https://www.kaggle.com/datasets/nikhil7280/coco-image-caption)
792+
Please adapt the provided ```CocoCaptions``` dataset class, the dataset can be found [here](https://www.kaggle.com/datasets/nikhil7280/coco-image-caption)
788793

789794
## License
790795

0 commit comments

Comments
 (0)