Skip to content

facebookresearch/univlg

Repository files navigation


Unifying 2D and 3D Vision-Language Understanding



Ayush Jain*1,2  Alexander Swerdlow*1  Yuzhou Wang1  Sergio Arnaud2  Ada Martin2  Alexander Sax2  Franziska Meier2  Katerina Fragkiadaki1 

1 Carnegie Mellon University  2 Meta AI 

ArXiv Webpage

Project Updates

News: 2025/02/25: We achieved 1st place on the ScanRefer localization leaderboard!

Hugging Face models

The UniVLG checkpoints are available on Hugging Face.

Getting Started

To install the dependencies, see docs/INSTALL.md.

Checkpoints

mkdir ckpts
uvx --from huggingface_hub huggingface-cli download katefgroup/UniVLG --include "univlg.pth" --local-dir ckpts

To download the 3D-only model, replace univlg.pth with univlg_3d_only.pth in the command above. Alternatively, to download all checkpoints, run:

uvx --from huggingface_hub huggingface-cli download katefgroup/UniVLG --local-dir ckpts

Training and Evaluation

See docs/RUN.md for training and evaluation commands.

Citation

To cite our work, please use the following:

@article{jain2025unifying,
  title={Unifying 2D and 3D Vision-Language Understanding},
  author={Jain, Ayush and Swerdlow, Alexander and Wang, Yuzhou and Arnaud, Sergio and Martin, Ada and Sax, Alexander and Meier, Franziska and Fragkiadaki, Katerina},
  journal={arXiv preprint arXiv:2503.10745},
  year={2025}
}

Credits

Notice

The majority of UniVLG is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Odin is licensed under the MIT license. Pointcept is licensed under the MIT license.