
Ayush Jain*1,2
Alexander Swerdlow*1
Yuzhou Wang1
Sergio Arnaud2
Ada Martin2
Alexander Sax2
Franziska Meier2
Katerina Fragkiadaki1
1 Carnegie Mellon University 2 Meta AI
News: 2025/02/25
: We achieved 1st place on the ScanRefer localization leaderboard!
The UniVLG checkpoints are available on Hugging Face.
To install the dependencies, see docs/INSTALL.md.
mkdir ckpts
uvx --from huggingface_hub huggingface-cli download katefgroup/UniVLG --include "univlg.pth" --local-dir ckpts
To download the 3D-only model, replace univlg.pth
with univlg_3d_only.pth
in the command above. Alternatively, to download all checkpoints, run:
uvx --from huggingface_hub huggingface-cli download katefgroup/UniVLG --local-dir ckpts
See docs/RUN.md for training and evaluation commands.
To cite our work, please use the following:
@article{jain2025unifying,
title={Unifying 2D and 3D Vision-Language Understanding},
author={Jain, Ayush and Swerdlow, Alexander and Wang, Yuzhou and Arnaud, Sergio and Martin, Ada and Sax, Alexander and Meier, Franziska and Fragkiadaki, Katerina},
journal={arXiv preprint arXiv:2503.10745},
year={2025}
}
The majority of UniVLG is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Odin is licensed under the MIT license. Pointcept is licensed under the MIT license.