Skip to content

xavier-zenghl/Point-UMAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning

arXiv Conference License: MIT Python 3.8 PyTorch 1.13

Paper (IEEE) | Code

Abstract

Masked Autoencoders (MAE) demonstrated exceptional performance in natural language processing and 2D vision tasks and have now been introduced into point cloud representation learning. We propose Point-UMAE, a novel self-supervised learning method based on a Unet-like structure, designed to enhance the capture of local details and global semantics in point clouds. Point-UMAE employs an asymmetric encoder-decoder architecture with a top-down fine-grained masking strategy to improve multi-scale consistency. The pre-trained model achieves state-of-the-art performance across various downstream tasks. Compared to the baseline Point-BERT, our method achieves a classification performance improvement of 1% and 4.1% on the ModelNet40 and ScanObjectNN datasets, respectively. We also investigate the impact of masking strategies and encoder structures on performance.

Installation

Requirements

  • Python 3.8
  • PyTorch 1.13.1
  • CUDA 11.7
# Clone repository
git clone https://github.com/robhlzeng/Point-UMAE.git
cd Point-UMAE

# Create conda environment
conda create -n point-umae python=3.8 -y
conda activate point-umae

# Install PyTorch (adjust CUDA version as needed)
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

# Install dependencies
pip install -r requirements.txt

# Install pointnet2_ops (requires CUDA toolkit)
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# Install Chamfer Distance
cd extensions/chamfer_dist && python setup.py install --user && cd ../..

Data Preparation

Download all datasets using the provided script:

bash scripts/download_data.sh all

Or download individually:

Dataset Command Source
ShapeNet-55 bash scripts/download_data.sh shapenet Point-BERT
ModelNet40 bash scripts/download_data.sh modelnet40 Stanford
ScanObjectNN bash scripts/download_data.sh scanobjectnn Official
ShapeNetPart bash scripts/download_data.sh shapenetpart Stanford

Expected data directory structure:

data/
├── ShapeNet55/
│   ├── shapenet_pc/
│   ├── train.txt
│   └── test.txt
├── ModelNet40/
│   ├── train_files.txt
│   ├── test_files.txt
│   └── ply_data_*.h5
├── ScanObjectNN/
│   └── h5_files/main_split/
└── shapenetcore_partanno_segmentation_benchmark_v0_normal/

Pre-training

# Single GPU
bash scripts/pretrain.sh

# Multi-GPU (4 GPUs)
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/pretrain.sh 4

Downstream Tasks

Shape Classification

# ModelNet40
bash scripts/finetune_cls.sh cfgs/classification/modelnet40.yaml <pretrain_ckpt>

# ScanObjectNN (PB_T50_RS, hardest variant)
bash scripts/finetune_cls.sh cfgs/classification/scanobjectnn_hardest.yaml <pretrain_ckpt>

# Test with voting
bash scripts/finetune_cls.sh cfgs/classification/modelnet40.yaml <best_ckpt> 1 --test --vote

Few-shot Classification

python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 5 --shot 10
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 5 --shot 20
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 10 --shot 10
python main.py --config cfgs/fewshot/modelnet40_fewshot.yaml --ckpts <pretrain_ckpt> --way 10 --shot 20

Part Segmentation

# ShapeNetPart
bash scripts/finetune_seg.sh <pretrain_ckpt>

# Test
python main.py --config cfgs/segmentation/shapenetpart.yaml --test --ckpts <best_ckpt>

Results

Task Dataset Metric Point-UMAE
Pre-training + SVM ShapeNet → ModelNet40 Accuracy 93.1%
Classification ModelNet40 Accuracy 94.2%
Classification ScanObjectNN (hardest) Accuracy 87.0%
Part Segmentation ShapeNetPart mIoU 86.1%
Few-shot (5w10s) ModelNet40 Accuracy 97.1 ± 1.9%

Project Structure

Point-UMAE/
├── models/              # Point-UMAE model architectures
│   ├── point_umae.py    # Pre-training model (MAE)
│   ├── point_umae_cls.py # Classification fine-tuning model
│   ├── point_umae_seg.py # Part segmentation model
│   ├── transformer.py   # Transformer blocks and U-shaped encoder
│   └── build.py         # Model registry and builder
├── datasets/            # Dataset loaders
│   ├── ShapeNet55Dataset.py    # ShapeNet for pre-training
│   ├── ModelNetDataset.py      # ModelNet40 classification
│   ├── ScanObjectNNDataset.py  # ScanObjectNN classification
│   └── ShapeNetPartDataset.py  # ShapeNetPart segmentation
├── tools/               # Training and evaluation runners
│   ├── runner_pretrain.py      # Pre-training runner
│   ├── runner_finetune.py      # Classification fine-tuning runner
│   ├── runner_fewshot.py       # Few-shot evaluation runner
│   └── runner_seg.py           # Part segmentation runner
├── utils/               # Utilities (config, logging, metrics)
├── cfgs/                # YAML configuration files
│   ├── pretrain/        # Pre-training configs
│   ├── classification/  # Classification configs
│   ├── fewshot/         # Few-shot configs
│   └── segmentation/    # Segmentation configs
├── extensions/          # CUDA extensions (Chamfer Distance)
├── scripts/             # Shell scripts for training and data download
├── main.py              # Main entry point
└── requirements.txt     # Python dependencies

Acknowledgements

This project builds upon several excellent open-source projects:

Citation

@inproceedings{zeng2025pointumae,
  title     = {Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning},
  author    = {Zeng, Hongliang and Zhang, Ping and Li, Fang and Ye, Tingyu and Wang, Jiahua and Yang, Xianbo},
  booktitle = {ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2025},
  doi       = {10.1109/ICASSP49660.2025.10890081},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

[ICASSP 2025] Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors