SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen*^✉ · Xu Zheng* · Ruixin Zhang* · Zhiqi Jiang*

SoulX-LiveAct presents a novel framework that enables lifelike, multimodal-controlled, high-fidelity human animation video generation for real-time streaming interactions.

(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded Neighbor Forcing for step-consistent AR video generation.

(II) We introduce ConvKV Memory, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.

(III) We develop an optimized real-time system that achieves 20 FPS using only two H100/H200 GPUs with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720×416 or 512×512 resolution.

🔥🔥🔥 News

⚡ May 27, 2026: We updated FP4 GEMM support for B-series GPUs, including RTX 5090, RTX PRO 6000, B100, and B200.
🔼 May 11, 2026: We updated the model and fixed several bugs to reduce excessive sharpening around the lips, teeth, and other facial details.
📢 Mar 18, 2026: We now support consumer GPUs (e.g., RTX 4090, RTX 5090) with FP8 KV cache and CPU model offloading. In our tests, the 18B model (14B Wan2.1 + 4B audio module) achieves a throughput of 6 FPS on a single RTX 5090.
👋 Mar 16, 2026: We release the inference code and model weights of SoulX-LiveAct.

🎥 Demo

👫 Podcast

concat_cut_h265.1.mp4

🎤 Music & Talk Show

teaser1_h265_10m.mp4

teaser2_h265_10m.mp4

📱 FaceTime

1_h265.mp4

2_h265.mp4

📑 Open-source Plan

Release inference code and checkpoints
GUI demo Support
End-end adaptive FP8 precision
Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage
Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)
Release training code

▶️ Quick Start

🛠️ Dependencies and Installation

Step 1: Install Basic Dependencies

conda create -n liveact python=3.10
conda activate liveact
pip install -r requirements.txt
conda install conda-forge::sox -y

Step 2: Install SageAttention

To enable fp8 attention kernel, you need to install SageAttention:

Install SageAttention:

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
git checkout v2.2.0
python setup.py install

(Optional) Install the modified version of SageAttention: To enable SageAttention for QKV's operator fusion, you need to install it by the following command:
```
git clone https://github.com/ZhiqiJiang/SageAttentionFusion.git
cd SageAttentionFusion
python setup.py install
```

Step 3: Install vllm:

To enable fp8 gemm kernel, you need to install vllm:

pip install vllm==0.11.0

Step 4 Install LightVAE:：

git clone https://github.com/ModelTC/LightX2V
cd LightX2V
python setup_vae.py install

🤗 Download Checkpoints

Model Cards

ModelName	Download
SoulX-LiveAct	🤗 Huggingface, 魔搭 ModelScope
chinese-wav2vec2-base	🤗 Huggingface

🔑 Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 20 \
    --dura_print \
    --input_json examples/example.json \
    --steam_audio

2. Run with action or emotion editing at real-time streaming performance

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 512*512 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example_edit.json

3. Run with the best performance settings

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json

4. Run on RTX 4090/RTX 5090 GPUs

Note: FP8 KV cache may slightly affect generation quality.

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --fp8_kv_cache \
    --block_offload \
    --t5_cpu

5. Run with single GPU for Eval

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --audio_cfg 1.7 \
    --t5_cpu

Command Line Arguments

Argument	Type	Required	Default	Description
`--size`	str	Yes	-	The width and height of the generated video.
`--t5_cpu`	bool	No	false	Whether to place T5 model on CPU.
`--offload_cache`	bool	No	-	Whether to place kv cache on CPU.
`--fps`	int	Yes	-	The target fps of the generated video.
`--audio_cfg`	float	No	1.0	Classifier free guidance scale for audio control.
`--dura_print`	bool	No	no	Whether print duration for every block.
`--input_json`	str	Yes	_	The condition json file path to generate the video.
`--seed`	int	No	42	The seed to use for generating the image or video.
`--steam_audio`	bool	No	false	Whether inference with steaming audio.
`--mean_memory`	bool	No	false	Whether to use the mean memory strategy during inference for further performance improvement.
`--fp8_kv_cache`	bool	No	false	Whether to store kv cache in FP8 and dequantize to BF16 on use. FP8 KV cache may slightly affect generation quality.
`--block_offload`	bool	No	false	Whether to offload model blocks to CPU between block forwards.

💻 GUI demo

Run SoulX-LiveAct inference on the GUI demo and evaluate real-time performance.

demo_h265.mp4

Note: The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --video_save_path ./generated_videos

2. Run on RTX 4090/RTX 5090 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
torchrun --nproc_per_node=1 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --fp8_kv_cache \
  --block_offload \
  --t5_cpu \
  --video_save_path ./generated_videos

📚 Citation

@misc{zhen2026soulxliveacthourscalerealtimehuman,
      title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory}, 
      author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},
      year={2026},
      eprint={2603.11746},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.11746}, 
}

📮 Contact Us

If you are interested in leaving a message to our work, feel free to email dingchengzhen@soulapp.cn.

You’re welcome to join our WeChat group or Soul group for technical discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
examples		examples
kokoro		kokoro
model_liveact		model_liveact
src		src
templates		templates
wan		wan
README.md		README.md
demo.py		demo.py
fp4_gemm.py		fp4_gemm.py
fp8_gemm.py		fp8_gemm.py
generate.py		generate.py
requirements.txt		requirements.txt
util_liveact.py		util_liveact.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

🔥🔥🔥 News

🎥 Demo

👫 Podcast

🎤 Music & Talk Show

📱 FaceTime

📑 Open-source Plan

▶️ Quick Start

🛠️ Dependencies and Installation

Step 1: Install Basic Dependencies

Step 2: Install SageAttention

Step 3: Install vllm:

Step 4 Install LightVAE:：

🤗 Download Checkpoints

Model Cards

🔑 Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run with action or emotion editing at real-time streaming performance

3. Run with the best performance settings

4. Run on RTX 4090/RTX 5090 GPUs

5. Run with single GPU for Eval

Command Line Arguments

💻 GUI demo

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run on RTX 4090/RTX 5090 GPUs

📚 Citation

📮 Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

🔥🔥🔥 News

🎥 Demo

👫 Podcast

🎤 Music & Talk Show

📱 FaceTime

📑 Open-source Plan

▶️ Quick Start

🛠️ Dependencies and Installation

Step 1: Install Basic Dependencies

Step 2: Install SageAttention

Step 3: Install vllm:

Step 4 Install LightVAE:：

🤗 Download Checkpoints

Model Cards

🔑 Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run with action or emotion editing at real-time streaming performance

3. Run with the best performance settings

4. Run on RTX 4090/RTX 5090 GPUs

5. Run with single GPU for Eval

Command Line Arguments

💻 GUI demo

1. Run real-time streaming inference on two H100/H200 GPUs

2. Run on RTX 4090/RTX 5090 GPUs

📚 Citation

📮 Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages