GitHub - WeiHongLee/CrossView3DMTL: 3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

3D-Aware Multi-Task Learning with Cross-View Correlations for
Dense Scene Understanding

Xiaoye Wang, Chen Tang, Xiangyu Yue, Wei-Hong Li

From left to right: RGB input and predictions of depth, segmentation, edge, and normal.

Updates

Nov 2025: Paper released on arXiv. Code is coming soon.

Overview

Current approaches in multi-task learning (MTL) mainly capture cross-task relations in the 2D image space, often leading to unstructured features lacking 3D-awareness and inconsistent predictions across views (See (a) and (c) top). We argue that 3D-awareness is vital for modeling cross-task correlations, and propose to address this problem by integrating correlations across views as geometric consistency in the MTL network (See (b) and (c) bottom).

As shown in above figure, given an image, we feed it and its neighbour view into the MTL encoder and extract the MTL features. In parallel, our lightweight cross-view module (CvM) takes as input both views. In CvM, a spatial-aware encoder encodes geometric-biased features, followed by a multi-view transformer that enables information exchanged across views and outputs cross-view features. A cost volume module then converts the cross-view features to the cost volume by warping the feature from one view to another given their relative camera parameters and matching features across views. Finally, both cost volume and cross-view feature are concatenated with the MTL features, forming the geometric-aware MTL feature for estimating predictions of multiple dense vision tasks. This module is architecture-agnostic and can be applied to both single and multi-view data. The overall framework is illustrated in the above figure. We show that this module is architecture-agnostic and can be applied to both single and multi-view data. Extensive results on NYUv2 and PASCAL-Context demonstrate that our method effectively injects geometric consistency into existing MTL methods to improve performance.

Contact

For any question, you can contact Wei-Hong Li or Xiaoye Wang.

Acknowledgements

We would like to thanks authors of DepthSplat, MVSplat, UniMatch, SAK for their source code.

Cite

@article{wang3dawaremtl,
      title={3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding},
      author={Xiaoye Wang and Chen Tang and Xiangyu Yue and Wei-Hong Li},
      year={2025},
      eprint={2511.20646},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2511.20646},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D-Aware Multi-Task Learning with Cross-View Correlations for
Dense Scene Understanding

Updates

Overview

Contact

Acknowledgements

Cite

About

Uh oh!

Releases

Packages

License

WeiHongLee/CrossView3DMTL

Folders and files

Latest commit

History

Repository files navigation

3D-Aware Multi-Task Learning with Cross-View Correlations forDense Scene Understanding

Updates

Overview

Contact

Acknowledgements

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

3D-Aware Multi-Task Learning with Cross-View Correlations for
Dense Scene Understanding

Packages