SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads

Tan Yu*, Qian Qiao*^✉, Le Shen*, Ke Zhou, Jincheng Hu, Dian Sheng, Bo Hu, Haoming Qin, Jun Gao, Changhai Zhou, Shunshun Yin, Siyuan Liu ^✉

^*Equal Contribution ^✉Corresponding Author

⚡ Highlights

Model_Lite Released get 96 FPS, or 3-concurrent real-time(25+ FPS) streaming on single RTX4090.
Model_Pro Released can generate high-quality videos with 10.8 FPS on single RTX4090, or real-time(25+ FPS) on two RTX5090.
Model_Pretrained is coming soon, providing high-performance weights and experimental foundations for community research.

🔥 News

2026.03.09 - Online demo on HuggingFace is available now. You can try it out directly.
2026.03.04 - Gradio app is available now. Both common and streaming mode are supported.
2026.03.02 - The ComfyUI node is now available. Thanks for the comfyui support of HM-RunningHub.
2026.02.12 - The online demo is now available via the Soul App. Download it today to try it out.
2026.02.12 - We have released the inference code, and the model weights.
2026.02.12 - We released Project page on SoulX-FlashHead.
2026.02.07 - We released Dataset.
2026.02.07 - We released SoulX-FlashHead Technical Report on Arxiv and GitHub repository.

📑 Todo List

🌰 Examples

More examples are available in the project.

qitiandasheng.mp4

chengdu.mp4

einstein.mp4

📖 Quickstart

🔧 Installation

1. Create a Conda environment

conda create -n flashhead python=3.10
conda activate flashhead

2. Install PyTorch on CUDA

pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128

3. Install other dependencies

pip install -r requirements.txt

4. FlashAttention installation:

pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation

-- If it takes a long time, we recommend the way below.

download wheel file from here
pip install xxx.whl

5. SageAttention installation (Optional)

pip install sageattention==2.2.0 --no-build-isolation

6. FFmpeg installation

# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel

or

# Conda (no root required) 
conda install -c conda-forge ffmpeg==7

🤗 Model download

Model Component	Description	Link
`SoulX-FlashHead-1_3B`	Our 1.3B model	🤗 Huggingface
`wav2vec2-base-960h`	wav2vec2-base-960h	🤗 Huggingface

# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B --local-dir ./models/SoulX-FlashHead-1_3B
huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h

🚀 Inference

# Infer with [Pro-Model] on single GPU
bash inference_script_single_gpu_pro.sh


# Infer with [Pro-Model] on multy GPUs
bash inference_script_multi_gpu_pro.sh
# Real-time inference speed of Pro-Model can only be supported on two RTX-5090 with SageAttention.


# Infer with [Lite-Model] on single GPU
bash inference_script_single_gpu_lite.sh
# Real-time inference speed can be supported on single RTX-4090 (up to 3 concurrent).

⚡️ Gradio Demo

# Gradio support needs gradio==5.50.0, and Chrome recommonded.

# common gradio demo
python gradio_app.py

# streaming gradio demo (Only support single GPU)
python gradio_app_streaming.py

🤗 Streaming online demo

Click here to experience the real-time streaming demo on HuggingFace Spaces.

👋 Online Experience

For a real-time interactive experience, scan the QR code to enter the event link. [2026.2.12~2026.3.11]

Real-time Online Experience
(SoulApp 实时在线体验)

📧 Contact Us

If you are interested in leaving a message to our work, feel free to email [email protected] or [email protected] or [email protected] or [email protected] or [email protected]

We have opened a WeChat group. Additionally, we represent SoulApp and warmly welcome everyone to download the app and join our Soul group for further technical discussions and updates!

Join WeChat Group
(加入微信技术群)

Download SoulApp & Join Group
(下载SoulApp加入群组)

📚 Citation

If you find our work useful in your research, please consider citing:

@misc{yu2026soulxflashheadoracleguidedgenerationinfinite,
      title={SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads}, 
      author={Tan Yu and Qian Qiao and Le Shen and Ke Zhou and Jincheng Hu and Dian Sheng and Bo Hu and Haoming Qin and Jun Gao and Changhai Zhou and Shunshun Yin and Siyuan Liu},
      year={2026},
      eprint={2602.07449},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.07449}, 
}

🙇 Acknowledgement

Wan: the base model we built upon.
LTX-Video: the VAE of our Lite-Model.
Self forcing: the codebase we built upon.
DMD and Self forcing++: the key distillation technique used by our method.
SoulX-FlashTalk is another model developed by our team, featuring 14B parameters and real-time capabilities.

Tip

If you find our work useful, please also consider starring the original repositories of these foundational methods.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
examples		examples
flash_head		flash_head
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_video.py		generate_video.py
gradio_app.py		gradio_app.py
gradio_app_streaming.py		gradio_app_streaming.py
inference_script_multi_gpu_pro.sh		inference_script_multi_gpu_pro.sh
inference_script_single_gpu_lite.sh		inference_script_single_gpu_lite.sh
inference_script_single_gpu_pro.sh		inference_script_single_gpu_pro.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads

⚡ Highlights

🔥 News

📑 Todo List

🌰 Examples

📖 Quickstart

🔧 Installation

1. Create a Conda environment

2. Install PyTorch on CUDA

3. Install other dependencies

4. FlashAttention installation:

5. SageAttention installation (Optional)

6. FFmpeg installation

🤗 Model download

🚀 Inference

⚡️ Gradio Demo

🤗 Streaming online demo

👋 Online Experience

📧 Contact Us

📚 Citation

🙇 Acknowledgement

💡 Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads

⚡ Highlights

🔥 News

📑 Todo List

🌰 Examples

📖 Quickstart

🔧 Installation

1. Create a Conda environment

2. Install PyTorch on CUDA

3. Install other dependencies

4. FlashAttention installation:

5. SageAttention installation (Optional)

6. FFmpeg installation

🤗 Model download

🚀 Inference

⚡️ Gradio Demo

🤗 Streaming online demo

👋 Online Experience

📧 Contact Us

📚 Citation

🙇 Acknowledgement

💡 Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages