(ICCV 2025) FlowTok: Flowing Seamlessly Across Text and Image Tokens

This repository provides a PyTorch re-implementation of FlowTok for the text-to-image generation task. Compared to the original paper, this implementation extends the generation capability to 512×512 resolution.

FlowTok: Flowing Seamlessly Across Text and Image Tokens

ICCV 2025

Ju He | Qihang Yu | Qihao Liu | Liang-Chieh Chen

[project page] | [paper] | [arxiv]

Setup

Environment

The code has been tested with PyTorch 2.1.2 and Cuda 12.1.

An example of installation commands is provided as follows:

git clone [email protected]:tacju/FlowTok.git
cd FlowTok

pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip3 install -U --pre triton
pip3 install -r requirements.txt

Training FlowTok for T2I

We provide a training script for text-to-image (T2I) generation in train_flowtok.sh.

Terms of use

The project is created for research purposes.

Acknowledgements

This codebase is built upon the following repository:

Much appreciation for their outstanding efforts.

BibTeX

If you use our work in your research, please use the following BibTeX entries.

@article{he2025flowtok,
  author    = {Ju He and Qihang Yu and Qihao Liu and Liang-Chieh Chen},
  title     = {FlowTok: Flowing Seamlessly Across Text and Image Tokens},
  journal   = {ICCV},
  year      = {2025}
}

@article{liu2025crossflow,
  author    = {Qihao Liu and Xi Yin and Alan Yuille and Andrew Brown and Mannat Singh},
  title     = {Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution},
  journal   = {CVPR},
  year      = {2025}
}

@article{kim2025democratizing,
  author    = {Dongwon Kim and Ju He and Qihang Yu and Chenglin Yang and Xiaohui Shen and Suha Kwak and Liang-Chieh Chen},
  title     = {Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens},
  journal   = {ICCV},
  year      = {2025}
}

@article{yu2024an,
  author    = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
  title     = {An Image is Worth 32 Tokens for Reconstruction and Generation},
  journal   = {NeurIPS},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
diffusion		diffusion
imgs		imgs
libs		libs
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
eval_t2i_mjhq.py		eval_t2i_mjhq.py
requirements.txt		requirements.txt
sde.py		sde.py
train_flowtok.sh		train_flowtok.sh
train_t2i.py		train_t2i.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(ICCV 2025) FlowTok: Flowing Seamlessly Across Text and Image Tokens

Setup

Environment

Training FlowTok for T2I

Terms of use

Acknowledgements

BibTeX

About

Uh oh!

Releases

Packages

Languages

License

TACJu/FlowTok

Folders and files

Latest commit

History

Repository files navigation

(ICCV 2025) FlowTok: Flowing Seamlessly Across Text and Image Tokens

Setup

Environment

Training FlowTok for T2I

Terms of use

Acknowledgements

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages