- Windows 10/11
- Python 3.8 (conda env: rvc38)
- CUDA 11.7 + PyTorch 2.0.1
- fairseq 0.12.2 (源码编译)
- GPU: RTX 3050 4GB
conda create -n rvc38 python=3.8
conda activate rvc38
下载: https://visualstudio.microsoft.com/visual-cpp-build-tools/ 安装时勾选 "C++ 桌面开发" 工作负载
pip install -r requirements_rvc38.txt
git clone https://github.com/facebookresearch/fairseq --branch v0.12.2
# 修补 setup.py: 将 hydra-core==1.0.7 改为 hydra-core>=1.0.7
# 将 omegaconf<2.1 改为 omegaconf<3.0
# 然后编译:
set DISTUTILS_USE_SDK=1
pip install -e . --no-build-isolation --no-deps
# 补充依赖:
pip install bitarray sacrebleu Cythongit clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
将本仓库中的 infer-web.py / train.py / preprocess.py 覆盖到对应位置
从 hf-mirror.com/lj1995/VoiceConversionWebUI 下载:
- assets/hubert/hubert_base.pt
- assets/rmvpe/rmvpe.pt
- assets/pretrained/*.pth (12个)
- assets/pretrained_v2/*.pth (12个)
- ffmpeg.exe / ffprobe.exe (放根目录)
将巷口星尘人声干音放入 train_data/ 目录
python infer-web.py --port 7930
python train_final.py
- train.py: num_workers 改为 0, prefetch_factor 改为 None (适配4GB显存)
- preprocess.py: argv 访问改为有默认值的防御写法
- infer-web.py: server_name="127.0.0.1", share=True