Masked Autoencoders (MAE) demonstrated exceptional performance in natural language processing and 2D vision tasks and have now been introduced into point cloud representation learning. We propose Point-UMAE, a novel self-supervised learning method based on a Unet-like structure, designed to enhance the capture of local details and global semantics in point clouds. Point-UMAE employs an asymmetric encoder-decoder architecture with a top-down fine-grained masking strategy to improve multi-scale consistency. The pre-trained model achieves state-of-the-art performance across various downstream tasks. Compared to the baseline Point-BERT, our method achieves a classification performance improvement of 1% and 4.1% on the ModelNet40 and ScanObjectNN datasets, respectively. We also investigate the impact of masking strategies and encoder structures on performance.
- Python 3.8
- PyTorch 1.13.1
- CUDA 11.7
# Clone repository
git clone https://github.com/robhlzeng/Point-UMAE.git
cd Point-UMAE
# Create conda environment
conda create -n point-umae python=3.8 -y
conda activate point-umae
# Install PyTorch (adjust CUDA version as needed)
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
# Install dependencies
pip install -r requirements.txt
# Install pointnet2_ops (requires CUDA toolkit)
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# Install Chamfer Distance
cd extensions/chamfer_dist && python setup.py install --user && cd ../..Download all datasets using the provided script:
bash scripts/download_data.sh allOr download individually:
| Dataset | Command | Source |
|---|---|---|
| ShapeNet-55 | bash scripts/download_data.sh shapenet |
Point-BERT |
| ModelNet40 | bash scripts/download_data.sh modelnet40 |
Stanford |
| ScanObjectNN | bash scripts/download_data.sh scanobjectnn |
Official |
| ShapeNetPart | bash scripts/download_data.sh shapenetpart |
Stanford |
Expected data directory structure:
data/
├── ShapeNet55/
│ ├── shapenet_pc/
│ ├── train.txt
│ └── test.txt
├── ModelNet40/
│ ├── train_files.txt
│ ├── test_files.txt
│ └── ply_data_*.h5
├── ScanObjectNN/
│ └── h5_files/main_split/
└── shapenetcore_partanno_segmentation_benchmark_v0_normal/
# Single GPU
bash scripts/pretrain.sh
# Multi-GPU (4 GPUs)
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/pretrain.sh 4# ModelNet40
bash scripts/finetune_cls.sh cfgs/classification/modelnet40.yaml <pretrain_ckpt>
# ScanObjectNN (PB_T50_RS, hardest variant)
bash scripts/finetune_cls.sh cfgs/classification/scanobjectnn_hardest.yaml <pretrain_ckpt>
# Test with voting
bash scripts/finetune_cls.sh cfgs/classification/modelnet40.yaml <best_ckpt> 1 --test --votepython main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 5 --shot 10
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 5 --shot 20
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 10 --shot 10
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 10 --shot 20# ShapeNetPart
bash scripts/finetune_seg.sh <pretrain_ckpt>
# Test
python main.py --config cfgs/segmentation/shapenetpart.yaml --test --ckpts <best_ckpt>| Task | Dataset | Metric | Point-UMAE |
|---|---|---|---|
| Pre-training + SVM | ShapeNet → ModelNet40 | Accuracy | 93.1% |
| Classification | ModelNet40 | Accuracy | 94.2% |
| Classification | ScanObjectNN (hardest) | Accuracy | 87.0% |
| Part Segmentation | ShapeNetPart | mIoU | 86.1% |
| Few-shot (5w10s) | ModelNet40 | Accuracy | 97.1 ± 1.9% |
Point-UMAE/
├── models/ # Point-UMAE model architectures
│ ├── point_umae.py # Pre-training model (MAE)
│ ├── point_umae_cls.py # Classification fine-tuning model
│ ├── point_umae_seg.py # Part segmentation model
│ ├── transformer.py # Transformer blocks and U-shaped encoder
│ └── build.py # Model registry and builder
├── datasets/ # Dataset loaders
│ ├── ShapeNet55Dataset.py # ShapeNet for pre-training
│ ├── ModelNetDataset.py # ModelNet40 classification
│ ├── ScanObjectNNDataset.py # ScanObjectNN classification
│ └── ShapeNetPartDataset.py # ShapeNetPart segmentation
├── tools/ # Training and evaluation runners
│ ├── runner_pretrain.py # Pre-training runner
│ ├── runner_finetune.py # Classification fine-tuning runner
│ ├── runner_fewshot.py # Few-shot evaluation runner
│ └── runner_seg.py # Part segmentation runner
├── utils/ # Utilities (config, logging, metrics)
├── cfgs/ # YAML configuration files
│ ├── pretrain/ # Pre-training configs
│ ├── classification/ # Classification configs
│ ├── fewshot/ # Few-shot configs
│ └── segmentation/ # Segmentation configs
├── extensions/ # CUDA extensions (Chamfer Distance)
├── scripts/ # Shell scripts for training and data download
├── main.py # Main entry point
└── requirements.txt # Python dependencies
This project builds upon several excellent open-source projects:
- Point-MAE for the masked autoencoder framework
- Point-BERT for the point cloud tokenizer
- Point-M2AE for the multi-scale architecture
- Pointnet2_PyTorch for the PointNet++ implementation
@inproceedings{zeng2025pointumae,
title = {Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning},
author = {Zeng, Hongliang and Zhang, Ping and Li, Fang and Ye, Tingyu and Wang, Jiahua and Yang, Xianbo},
booktitle = {ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = {2025},
doi = {10.1109/ICASSP49660.2025.10890081},
}This project is licensed under the MIT License - see the LICENSE file for details.

