Skip to content

Latest commit

 

History

History
159 lines (116 loc) · 4.78 KB

File metadata and controls

159 lines (116 loc) · 4.78 KB

Requirements

Operating System

The total pipeline needs two machines, one for client and one for server.

For 2D avatar, the RAG will run on the server and other parts will run on the client.
For 3D avatar, the RAG and lipsync (SAiD) will run on the server and the other parts will run on the client.

Hardware

Both Windows and Ubuntu machines should be equipped with Intel Arc A770 GPU 16GB x1.

Environment Setup for Client (Windows)

The guide assumes the use of Visual Studio Code and Conda-forge. However, you can use any other source code editor you prefer.

Install Visual Studio Code and Conda-forge

Download the Visual Studio Code installer and install the tool. Then, set up Git in VS Code.

Download the Conda-forge installer and install the tool. You need to add the tool to the system's PATH if you want to use it in the VS Code terminal.

After the configuration, you can proceed with the setup, using VS Code terminal.

Prepare project code and models on client

Organize the project code accordingly and make sure the downloaded third-party models are within the file structure presented below.

MuseTalk

  1. Follow the MuseTalk documentation and place all the models under the resource/musetalk_models directory.

    resource/musetalk_models/
    ├── musetalk
    │   ├── musetalk.json
    │   └── pytorch_model.bin
    ├── dwpose
    │   └── dw-ll_ucoco_384.pth
    ├── face-parse-bisent
    │   ├── 79999_iter.pth
    │   └── resnet18-5c106cde.pth
    ├── sd-vae-ft-mse
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
    └── whisper
        └── tiny.pt
    
  2. Convert the MuseTalk model to Openvino IR (Intermediate Representation):

    python -m da.util.musetalk_torch2ov

    After successful conversion, the Openvino IR will be saved to the following directory:

    resource/musetalk_models/
    └── musetalk
        ├── unet-vae-b4.bin
        └── unet-vae-b4.xml
    

FunASR

  1. Use Git to download the FunASR speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch model, used as an automatic speech recognition (ASR) model in the pipeline.

    cd resource/funasr_models
    git clone https://www.modelscope.cn/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git
    cd ../..
  2. Convert the FunASR model to ONNX and apply quantization, using the command below:

    python -m da.util.funasr_torch2onnx

    After a successful conversion, the onnx quantized model will be saved to the following directory:

    resource/funasr_models/
    └── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
        └── model.onnx
    

Setup Python environment

Run the commands below to create the Python environment and install required dependencies.

conda create --name da python=3.10 ffmpeg webrtcvad
conda activate da
pip install -r requirements.txt
mim install -r requirements-mim.txt

Environment Setup for Server (Ubuntu)

Install Docker

To install Docker, refer to the official website.

Prepare project code and models on server

  1. Organize the project code accordingly:
  • In Ubuntu, the required code should be in the said_docker folder.

  • SAiD models are used for 3D lip sync. SAiD.pth should be placed under the said_docker/said_models directory:

    said_docker
    └── said_models
        └── SAiD.pth
    
  • Downloaded third-party models should also be saved within a similar file structure.

  1. Follow "SAiD on A770" to build and setup a server.

Prepare RAG

To set up a Retrieval-Augmented Generation (RAG) pipeline, refer to the guide.

Learn More