MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities
Project Page | arXiv | Dataset
Federico Lincetto1, Gianluca Agresti2, Mattia Rossi2, Pietro Zanuttigh1
1University of Padova; 2Sony Europe Limited
Accepted at CVPR 2025
Official repository of MultimodalStudio, a project that includes MMS-DATA and MMS-FW. MMS-DATA is a geometrically calibrated multi-view multi-sensor dataset; MMS-FW is a multimodal NeRF framework that supports mosaicked, demosaicked, distorted, and undistorted frames of different modalities.
We conducted in depth investigations proving that using multiple imaging modalities improves the novel view rendering quality of each involved modality.
git clone https://github.com/LTTM/MultimodalStudio.git
cd MultimodalStudioWe suggest using conda-based CLI (e.g. micromamba) for convenience, as it simplifies the environment setup. However, you can use any other environment manager.
conda env create -f requirements.yaml
conda activate multimodalstudiotiny-cuda-nn is a fast neural network library developed in C++/CUDA, designed for efficient training and inference of small neural networks and multi-resolution hash encodings, especially in neural graphics and neural rendering applications.
Requirements:
- CUDA toolkit (including the CUDA compiler
nvcc) must be installed and available in your system. - The version of the CUDA toolkit must match the version of CUDA installed along with PyTorch during the environment creation.
If you are unsure about your CUDA version, check with:
nvcc --versionand ensure it matches the CUDA version reported by:
# Activate the correct environment first
python -c "import torch; print(torch.version.cuda)"Install tiny-cuda-nn:
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torchThe working directory is where the main scripts and configurations are located.
We suggest to add this folder to your PYTHONPATH environment variable for easier access to the modules.
cd ./src
export PYTHONPATH=$(pwd):$PYTHONPATHYou have two options for preparing the dataset:
-
Download Preprocessed Data
Preprocessed datasets are available on the Dataset Page. These datasets are ready to be used for training without any additional steps. -
Use a Custom Dataset
If you want to use your own dataset, you need to preprocess it using the provided scripts. The preprocessing step ensures that the dataset is formatted correctly and ready for training.
The preprocessing script requires a specific folder structure for the input data:
<scene_folder>/
calibration.json
modalities/
<modality1>/
0000.png
0001.png
...
<modality2>/
0000.png
0001.png
...
- Each modality should have its own subfolder inside
modalities/containing the corresponding frames. - The
calibration.jsonfile must be placed directly inside the<scene_folder>.
The calibration.json file must follow this structure:
{
"modality1": {
"sensor": "<sensor_name>",
"width": <image_width>,
"height": <image_height>,
"fx": <focal_length_x>,
"fy": <focal_length_y>,
"cx": <principal_point_x>,
"cy": <principal_point_y>,
"distortion_params": [
<k1>, <k2>, <p1>, <p2>, <k3>, <k4>
],
"mosaick_pattern": [
[<pattern_row_1>],
[<pattern_row_2>]
],
"camera2reference": [
[<r11>, <r12>, <r13>, <t1>],
[<r21>, <r22>, <r23>, <t2>],
[<r31>, <r32>, <r33>, <t3>],
[0.0, 0.0, 0.0, 1.0]
]
},
...additional modalities...
}- Each modality (e.g.,
modality1,modality2) must have its own entry. - Key parameters include:
fx,fy: Focal lengths in pixels.cx,cy: Principal points in pixels.distortion_params: Radial and tangential distortion coefficients.mosaick_pattern: The mosaick pattern for the modality.camera2reference: Transformation matrix from the camera coordinate system to the reference modality camera coordinate system. In example, if the reference modality is the RGB, then all the other modalities have a camera2reference transofrmation matrix to map their coordinate system to the RGB coordinate system. The reference modality is the first one listed in themodalitiesargument during preprocessing. Its frames are used to compute the camera poses with COLMAP.
These scripts will preprocess your dataset, preparing the data for training.
python src/preprocessing/preprocess_custom_dataset.py \
--source-path <scene_folder> \
--output-path <output_path> \
--colmap-path <colmap_path> \
--modalities <modality1> <modality2> ... \
--run-colmap \
--calibration <scene_folder>/calibration.json \
--scale 1.0 \
--undistort \
--demosaick \
--raw-inputFor more details on the arguments, run:
python src/preprocessing/preprocess_custom_dataset.py --helpIn the case you want to preprocess the MMS-DATA dataset, first you can download the "Source Data" version from the Dataset Page, then use the provided preprocessing script specific for MMS-DATA.
python src/preprocessing/preprocess_mmsdata.py \
--source-path <scene_folder> \
--output-path <output_path> \
--colmap-path <colmap_path> \
--modalities rgb infrared mono polarization multispectral \
--run-colmap \
--calibration <scene_folder>/calibration.json \
--scale 1.0 \
--undistort \
--demosaickAdjust the arguments according to your needs.
The main launcher script is:
python src/launcher.py \
--mode <train_or_eval> \
--conf_path <path_to_config_file> \
--scene <path_to_preprocessed_scene_folder> \
--version <experiment_version_name>--mode: Specify the mode of operation, eithertrainoreval.--conf_path: Path to the configuration file (e.g.,confs/grid_raw.yaml).--scene: Path to the processed scene folder.--version: (Optional) A name or identifier for the experiment version.--view_ids: (Optional, use with --mode=eval) Specify the view indices to evaluate the model on during evaluation. If nor provided, the script will evaluate all the views specified in theconfs/<config_file>.yamlpassed to--conf_path.
Example:
python src/launcher.py \
--mode train \
--conf_path confs/grid_raw.yaml \
--scene /path/to/processed/dataset/scene_name \
--version my_first_testConfigure your experiment by editing the configuration file in ./confs/<config_file>.yaml.
For more information on how to edit the configuration files and use the modularity features, check the guide in the docs folder (see docs/modularity_documentation.md).
To evaluate the quality of rendered frames, use:
python scripts/evaluate_average_metrics.pyEdit the script to set the correct paths for:
general_path(training output folder)consistent_mask_path(mask output folder)source_data_path(ground truth data)
The script will compute PSNR, SSIM, and LPIPS metrics for each modality and print average results.
To reproduce the results reported in the MultimodalStudio paper, you can train the framework on all the scenes employing the method configurations provided in src/configs/method_configs.py, the config files in confs/, and using the data provided in the Dataset page.
Below we report the average PSNR and SSIM metrics (over all scenes) for a 5-modality training, obtained by training with raw frames and multiresolution hash grid models:
| Modality | PSNR (↑) | SSIM (↑) |
|---|---|---|
| RGB | 32.45 | - |
| Mono | 32.75 | 0.94 |
| NIR | 34.06 | 0.93 |
| Polarization | 30.91 | - |
| Multispectral | 31.27 | - |
Note:
These results are slightly better than those reported in the paper. This is because, for these experiments, we used an MLP to estimate the background instead of a multiresolution hash grid (to save memory space), and we employed slightly deeper modality heads. All other settings match the original paper.
For more details, refer to the comments in each script and the documentation in the repository.
If you use this code or dataset, please cite:
@inproceedings{lincetto2025multimodalstudio,
author = {Lincetto, Federico and Agresti, Gianluca and Rossi, Mattia and Zanuttigh, Pietro},
title = {MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2025},
}This project was funded by Sony Europe Limited.
This project was inspired by NeRFStudio and SDFStudio.
Moreover, tiny-cuda-nn and polanalyser are used in this project.
We thank their authors for their contributions to the field and for providing excellent resources for the community.
