This repository provides a PyTorch re-implementation of FlowTok for the text-to-image generation task. Compared to the original paper, this implementation extends the generation capability to 512×512 resolution.
FlowTok: Flowing Seamlessly Across Text and Image Tokens
ICCV 2025
Ju He | Qihang Yu | Qihao Liu | Liang-Chieh Chen
[project page] | [paper] | [arxiv]
-
The code has been tested with PyTorch 2.1.2 and Cuda 12.1.
An example of installation commands is provided as follows:
git clone [email protected]:tacju/FlowTok.git cd FlowTok pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 pip3 install -U --pre triton pip3 install -r requirements.txt
We provide a training script for text-to-image (T2I) generation in train_flowtok.sh
.
The project is created for research purposes.
This codebase is built upon the following repository:
Much appreciation for their outstanding efforts.
If you use our work in your research, please use the following BibTeX entries.
@article{he2025flowtok,
author = {Ju He and Qihang Yu and Qihao Liu and Liang-Chieh Chen},
title = {FlowTok: Flowing Seamlessly Across Text and Image Tokens},
journal = {ICCV},
year = {2025}
}
@article{liu2025crossflow,
author = {Qihao Liu and Xi Yin and Alan Yuille and Andrew Brown and Mannat Singh},
title = {Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution},
journal = {CVPR},
year = {2025}
}
@article{kim2025democratizing,
author = {Dongwon Kim and Ju He and Qihang Yu and Chenglin Yang and Xiaohui Shen and Suha Kwak and Liang-Chieh Chen},
title = {Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens},
journal = {ICCV},
year = {2025}
}
@article{yu2024an,
author = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
title = {An Image is Worth 32 Tokens for Reconstruction and Generation},
journal = {NeurIPS},
year = {2024}
}