The official PyTorch implementation of our paper:
[TPAMI 2025] JointFormer: A Unified Framework with Joint Modeling for Video Object Segmentation
Authors: Jiaming Zhang, Yutao Cui, Gangshan Wu, Limin Wang
A unified VOS framework for joint modeling the three elements of feature, correspondence, and our presented compressed memory.
- See INSTALL.md for instructions of installing required python packages.
- See DATASET.md for datasets download and preparation.
- See TRAINING.md for training details.
- See INFERENCE.md for inference details and downloading pretrained models.
This project is built upon XMem, ConvMAE. Thanks to the contributors of these great codebases.
@ARTICLE{10949703,
author={Zhang, Jiaming and Cui, Yutao and Wu, Gangshan and Wang, Limin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={JointFormer: A Unified Framework with Joint Modeling for Video Object Segmentation},
year={2025},
volume={},
number={},
pages={1-17},
keywords={Feature extraction;Transformers;Pipelines;Object segmentation;Data mining;Benchmark testing;Correlation;Computer vision;Aggregates;Video sequences;Video object segmentation;joint modeling;compressed memory;vision transformer},
doi={10.1109/TPAMI.2025.3557841}
}