Dingxi Zhang1 Fangjinhua Wang1 Marc Pollefeys1,2 Haofei Xu1,3
1 ETH Zurich 2 Microsoft 3 University of Tübingen, Tübingen AI Center
MegaFlow is a simple, powerful, and unified model for zero-shot large displacement optical flow and point tracking.
MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by a lightweight iterative refinement for sub-pixel accuracy. This approach achieves state-of-the-art zero-shot performance across major optical flow benchmarks (Sintel, KITTI, Spring) while delivering highly competitive zero-shot generalizability on long-range point tracking benchmarks.
- 🏆 Strong zero-shot performance across Sintel, Spring, and KITTI
- 🎯 Excels in large displacement optical flow estimation
- 📹 Flexible temporal window: seamlessly processes any number of frames
- 🔄 General motion backbone: naturally extends to point tracking
# Clone the repository
git clone https://github.com/cvg/megaflow.git
cd megaflow
# Create local conda environment
conda create -n megaflow python=3.12 -y
conda activate megaflow
# Install dependencies
pip install -e .
# (Optional) Install FlashAttention-3 for faster inference on Hopper GPUs
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py install
cd ../..Or install directly:
pip install git+https://github.com/cvg/megaflow.gitRequirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.
Pretrained checkpoints are available on 🤗 HuggingFace and are auto-downloaded:
| Model Name | Description |
|---|---|
megaflow-flow |
Optical flow (default) |
megaflow-chairs-things |
Optical flow trained on FlyingThings and FlyingChairs |
megaflow-track |
Point tracking (Kubric fine-tuned) |
import torch
from megaflow import MegaFlow
from megaflow.utils.basic import gridcloud2d
device = "cuda" if torch.cuda.is_available() else "cpu"
# Prepare video tensor [1, T, 3, H, W] in float32, range [0, 255]
video = ...
with torch.inference_mode():
with torch.autocast(device_type=device, dtype=torch.bfloat16, enabled=True):
# --- Task 1: Optical Flow ---
flow_model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)
# Returns flow predictions for consecutive frame pairs (0->1, 1->2...)
flow_predictions = flow_model(video, num_reg_refine=8)["flow_preds"][-1]
# --- Task 2: Point Tracking ---
track_model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)
# Returns tracking offsets between first frame and query frame (0->t)
flows_e = track_model.forward_track(video, num_reg_refine=8)["flow_final"]
# Add absolute grid coordinates to get final point tracks
grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
tracking_predictions = flows_e + grid_xy# Processes the video and auto-downloads the megaflow-flow model
python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4# Tracks points and auto-downloads the megaflow-track model
python demo_track.py --input assets/apple.mp4 --grid_size 8You can also run python demo_gradio.py to launch a local web UI, try our HuggingFace demo or open the Colab notebook for an interactive online demo directly in the browser.
To train and evaluate MegaFlow, you will need to download the required datasets: FlyingChairs, FlyingThings3D, Sintel, KITTI, HD1K, TartanAir, and Spring.
For tracking, you will need to download processed Kubric from AllTracker and TAP-Vid:
- Kubric: Download the 24-frame data (kubric_au.tar.gz) and the 64-frame data parts (part1, part2, part3).
- TAP-Vid: Download the TAP-Vid-DAVIS, TAP-Vid-RGB-stacking and TAP-Vid-Kinetics datasets from here for evaluation.
Merge the point tracking splits by concatenating:
cat ce64_kub_aa ce64_kub_ab ce64_kub_ac > ce64_kub.tar.gzBy default datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder:
├── datasets
├── FlyingChairs_release
├── FlyingThings3D
├── Sintel
├── KITTI
├── HD1K
├── spring
├── TartanAir
├── kubric_au/
└── TAP_Vid/
├── tapvid_davis/
├── tapvid_kinetics/
└── tapvid_rgb_stacking/MegaFlow was trained on a multi-stage curriculum, where each stage loads a checkpoint from the previous stage via the restore_ckpt field in the config JSON.
Please refer to train.sh for the complete training curriculum.
Note: Adjust
--nproc_per_nodebased on the number of available GPUs. Theeffective_batch_sizein the config will be split across all GPUs and nodes automatically. Updaterestore_ckptin each config to point to the checkpoint from the previous stage.
# Zero-shot evaluation (Sintel + KITTI)
python -m scripts.evaluate --cfg config/eval/zero-shot.json
# Point tracking (TAP-Vid)
python -m scripts.evaluate --cfg config/eval/tapvid.jsonNote: Update the
restore_ckptfield in each eval config to point to your trained checkpoints.
If you find MegaFlow useful in your research, please cite:
@article{zhang2026megaflow,
title = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
author = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
journal = {arXiv preprint arXiv:2603.25739},
year = {2026}
}We thank the original authors of the following projects for their excellent open-source work: Unimatch, GMFlow, VGGT, AllTracker, SEA-RAFT, and MEMFOF.
This project is released under the Apache 2.0 License.
