Skip to content

cvg/megaflow

Repository files navigation

MegaFlow: Zero-Shot Large Displacement Optical Flow

Dingxi Zhang1     Fangjinhua Wang1     Marc Pollefeys1,2     Haofei Xu1,3

1 ETH Zurich       2 Microsoft       3 University of Tübingen, Tübingen AI Center

Project Page arXiv Demo Open In Colab

MegaFlow Teaser Overview

MegaFlow is a simple, powerful, and unified model for zero-shot large displacement optical flow and point tracking.

MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by a lightweight iterative refinement for sub-pixel accuracy. This approach achieves state-of-the-art zero-shot performance across major optical flow benchmarks (Sintel, KITTI, Spring) while delivering highly competitive zero-shot generalizability on long-range point tracking benchmarks.

Highlights

  • 🏆 Strong zero-shot performance across Sintel, Spring, and KITTI
  • 🎯 Excels in large displacement optical flow estimation
  • 📹 Flexible temporal window: seamlessly processes any number of frames
  • 🔄 General motion backbone: naturally extends to point tracking

Installation

# Clone the repository
git clone https://github.com/cvg/megaflow.git
cd megaflow

# Create local conda environment
conda create -n megaflow python=3.12 -y
conda activate megaflow

# Install dependencies
pip install -e .

# (Optional) Install FlashAttention-3 for faster inference on Hopper GPUs
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py install
cd ../..

Or install directly:

pip install git+https://github.com/cvg/megaflow.git

Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.

Pretrained Models

Pretrained checkpoints are available on 🤗 HuggingFace and are auto-downloaded:

Model Name Description
megaflow-flow Optical flow (default)
megaflow-chairs-things Optical flow trained on FlyingThings and FlyingChairs
megaflow-track Point tracking (Kubric fine-tuned)
import torch
from megaflow import MegaFlow
from megaflow.utils.basic import gridcloud2d

device = "cuda" if torch.cuda.is_available() else "cpu"

# Prepare video tensor [1, T, 3, H, W] in float32, range [0, 255]
video = ...

with torch.inference_mode():
    with torch.autocast(device_type=device, dtype=torch.bfloat16, enabled=True):
        # --- Task 1: Optical Flow ---
        flow_model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)
        # Returns flow predictions for consecutive frame pairs (0->1, 1->2...)
        flow_predictions = flow_model(video, num_reg_refine=8)["flow_preds"][-1]

        # --- Task 2: Point Tracking ---
        track_model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)
        # Returns tracking offsets between first frame and query frame (0->t)
        flows_e = track_model.forward_track(video, num_reg_refine=8)["flow_final"]
        # Add absolute grid coordinates to get final point tracks
        grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()  
        grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)  
        tracking_predictions = flows_e + grid_xy

Demo

Optical Flow Estimation

# Processes the video and auto-downloads the megaflow-flow model
python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4

Point Tracking

# Tracks points and auto-downloads the megaflow-track model
python demo_track.py --input assets/apple.mp4 --grid_size 8

You can also run python demo_gradio.py to launch a local web UI, try our HuggingFace demo or open the Colab notebook for an interactive online demo directly in the browser.

Datasets

To train and evaluate MegaFlow, you will need to download the required datasets: FlyingChairs, FlyingThings3D, Sintel, KITTI, HD1K, TartanAir, and Spring.

For tracking, you will need to download processed Kubric from AllTracker and TAP-Vid:

  • Kubric: Download the 24-frame data (kubric_au.tar.gz) and the 64-frame data parts (part1, part2, part3).
  • TAP-Vid: Download the TAP-Vid-DAVIS, TAP-Vid-RGB-stacking and TAP-Vid-Kinetics datasets from here for evaluation.

Merge the point tracking splits by concatenating:

cat ce64_kub_aa ce64_kub_ab ce64_kub_ac > ce64_kub.tar.gz

By default datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder:

├── datasets
    ├── FlyingChairs_release
    ├── FlyingThings3D
    ├── Sintel
    ├── KITTI
    ├── HD1K
    ├── spring
    ├── TartanAir
    ├── kubric_au/
    └── TAP_Vid/
        ├── tapvid_davis/
        ├── tapvid_kinetics/
        └── tapvid_rgb_stacking/

Training

MegaFlow was trained on a multi-stage curriculum, where each stage loads a checkpoint from the previous stage via the restore_ckpt field in the config JSON.

Please refer to train.sh for the complete training curriculum.

Note: Adjust --nproc_per_node based on the number of available GPUs. The effective_batch_size in the config will be split across all GPUs and nodes automatically. Update restore_ckpt in each config to point to the checkpoint from the previous stage.

Evaluation

# Zero-shot evaluation (Sintel + KITTI)
python -m scripts.evaluate --cfg config/eval/zero-shot.json

# Point tracking (TAP-Vid)
python -m scripts.evaluate --cfg config/eval/tapvid.json

Note: Update the restore_ckpt field in each eval config to point to your trained checkpoints.

Citation

If you find MegaFlow useful in your research, please cite:

@article{zhang2026megaflow,
  title   = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
  author  = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
  journal = {arXiv preprint arXiv:2603.25739},
  year    = {2026}
}

Acknowledgements

We thank the original authors of the following projects for their excellent open-source work: Unimatch, GMFlow, VGGT, AllTracker, SEA-RAFT, and MEMFOF.

License

This project is released under the Apache 2.0 License.

About

MegaFlow: Zero-Shot Large Displacement Optical Flow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors