Skip to content

Soul-AILab/SoulX-LiveAct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LiveAct Logo

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen*βœ‰ Β· Xu Zheng* Β· Ruixin Zhang* Β· Zhiqi Jiang*

Yichao Yan Β· Ming Tao Β· Shunshun Yin

LiveAct presents a novel framework that enables lifelike, multimodal-controlled, high-fidelity human animation video generation for real-time streaming interactions.

(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded Neighbor Forcing for step-consistent AR video generation.

(II) We introduce ConvKV Memory, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.

(III) We develop an optimized real-time system that achieves 20 FPS using only two H100/H200 GPUs with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720Γ—416 or 512Γ—512 resolution.

πŸ”₯πŸ”₯πŸ”₯ News

  • πŸ‘‹ Mar 16, 2026: We release the inference code and model weights of LiveAct.

πŸŽ₯ Demo

Note: Due to GitHub limitations, the videos are heavily compressed. Please refer to the demo page for the original results.

πŸ‘« Podcast

podcast_h265.mp4

🎀 Music & Talk Show

teaser1_h265_10m.mp4
teaser2_h265_10m.mp4

πŸ“± FaceTime

1_h265.mp4
2_h265.mp4

πŸ“‘ Open-source Plan

  • Release inference code and checkpoints
  • GUI demo Support
  • End-end adaptive FP8 precision
  • Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage
  • Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)
  • Release training code

▢️ Quick Start

πŸ› οΈ Dependencies and Installation

Step 1: Install Basic Dependencies

conda create -n liveact python=3.10
conda activate liveact
pip install -r requirements.txt
conda install conda-forge::sox -y

Step 2: Install SageAttention

To enable fp8 attention kernel, you need to install SageAttention:

  • Install SageAttention:

    git clone https://github.com/thu-ml/SageAttention.git
    cd SageAttention
    git checkout v2.2.0
    python setup.py install
  • (Optional) Install the modified version of SageAttention: To enable SageAttention for QKV's operator fusion, you need to install it by the following command:

    git clone https://github.com/ZhiqiJiang/SageAttentionFusion.git
    cd SageAttentionFusion
    python setup.py install

Step 3: Install vllm:

To enable fp8 gemm kernel, you need to install vllm:

pip install vllm==0.11.0

Step 4 Install LightVAE::

git clone https://github.com/ModelTC/LightX2V
cd LightX2V
python setup_vae.py install

πŸ€— Download Checkpoints

Model Cards

ModelName Download
LiveAct πŸ€— Huggingface
chinese-wav2vec2-base πŸ€— Huggingface

πŸ”‘ Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 20 \
    --dura_print \
    --input_json examples/example.json \
    --steam_audio

2. Run with single GPU for Eval

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=7 \
python generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --audio_cfg 1.7 \
    --t5_cpu

Command Line Arguments

Argument Type Required Default Description
--size str Yes - The width and height of the generated video.
--t5_cpu bool No false Whether to place T5 model on CPU.
--offload_cache bool No - Whether to place kv cache on CPU.
--fps int Yes - The target fps of the generated video.
--audio_cfg float No 1.0 Classifier free guidance scale for audio control.
--dura_print bool No no Whether print duration for every block.
--input_json str Yes _ The condition json file path to generate the video.
--seed int No 42 The seed to use for generating the image or video.
--steam_audio bool No false Whether inference with steaming audio.
--mean_memory bool No false Whether to use the mean memory strategy during inference for further performance improvement.

πŸ’» GUI demo

Run LiveAct inference on the GUI demo and evaluate real-time performance.

demo_h265.mp4

Note: The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --video_save_path ./generated_videos

πŸ“š Citation

@misc{zhen2026soulxliveacthourscalerealtimehuman,
      title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory}, 
      author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},
      year={2026},
      eprint={2603.11746},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.11746}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors