Skip to content

Singing Voice Conversion & Singing Voice Cloning for OSX

License

Notifications You must be signed in to change notification settings

audiohacking/so-vits-svc-osx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

551 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sovits_logo

Variational Inference with adversarial learning for end-to-end Singing Voice Conversion based on VITS

Hugging Face Spaces GitHub Repo stars GitHub forks GitHub issues GitHub

5.11 Update by HorikitaSaku

  • Added data2vec as content encoder for improved semantic representation
  • Implemented hybrid pitch detection: 75% CREPE + 15% RMVPE for more robust pitch extraction
  • Added Mel Cepstrum loss for better spectral envelope matching
  • Continued improvements from previous versions:

Device Support

This project now supports multiple compute devices:

  • NVIDIA GPUs (CUDA): Full support with optimal performance
  • Apple Silicon (M1/M2/M3 via MPS): Hardware acceleration via Metal Performance Shaders
  • CPU: Fallback option when no GPU is available

The device is automatically detected and selected based on availability. Priority order: CUDA > MPS > CPU.

Verifying Device Detection

After installing PyTorch, you can verify that your device is correctly detected:

python verify_device.py

This script will display:

  • Your system information and architecture
  • Available compute devices (CUDA/MPS/CPU)
  • Which device will be used for training and inference
  • A simple test to verify the device is working correctly

Expected output on Apple Silicon Macs:

Selected Device:     mps
Device Type:         mps
Device Name:         MPS (Apple Silicon)
✓ Apple Silicon (MPS) GPU acceleration is available and working

Expected output on NVIDIA GPU systems:

Selected Device:     cuda:0
Device Type:         cuda
Device Name:         CUDA (NVIDIA GeForce RTX ...)
✓ CUDA GPU acceleration is available and working

Running Tests

To run the device detection test suite:

# Run all device detection tests
python -m pytest tests/test_device_detection.py -v

# Or run with unittest
python tests/test_device_detection.py

macOS Native App

SoVits-SVC can be bundled as a native macOS application using PyInstaller and pywebview.

Building the macOS App

On macOS systems, you can build a standalone .app bundle:

./build_local.sh

This creates dist/SoVitsSVC.app which can be distributed to users.

Features

  • Native macOS app with pywebview for a native window experience
  • Code signed (ad-hoc signing by default, can use Apple Developer certificate)
  • DMG installer for easy distribution
  • Apple Silicon optimized - automatically uses Metal Performance Shaders (MPS)
  • No terminal required - runs as a standard macOS application

Documentation

For detailed build instructions, see BUILD_MACOS.md.

For automated builds via GitHub Actions, see .github/workflows/build-release.yml.

Setup Environment

  1. Install PyTorch.

    For Apple Silicon (M1/M2/M3) users: PyTorch will automatically use MPS (Metal Performance Shaders) for GPU acceleration when available.

  2. Install project dependencies

    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

    Note: whisper is already built-in, do not install it again otherwise it will cuase conflict and error

  3. Download the Timbre Encoder: Speaker-Encoder by @mueller91, put best_model.pth.tar into speaker_pretrain/.

  4. Download whisper model whisper-large-v2 or whisper-large-v3. Make sure to download the model file and put it into whisper_pretrain/.

  5. Download hubert_soft model,put hubert-soft-0d54a1f4.pt into hubert_pretrain/.

  6. Download RMVPE model and put rmvpe.pt into rmvpe_pretrain/.

  7. Download pretrain model sovits5.0.pretrain.pth, and put it into vits_pretrain/.

    python svc_inference.py --config configs/base.yaml --model ./vits_pretrain/sovits5.0.pretrain.pth --spk ./configs/singers/singer0001.npy --wave test.wav

Troubleshooting

For Apple Silicon (M1/M2/M3) Users

If MPS is not detected on your Mac:

  1. Check PyTorch version: MPS support requires PyTorch 1.12 or later

    python -c "import torch; print(torch.__version__)"
  2. Verify macOS version: MPS requires macOS 12.3 or later

    sw_vers
  3. Install/Update PyTorch:

    pip3 install --upgrade torch torchvision torchaudio
  4. Verify MPS availability:

    python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"
  5. Run the verification script to see detailed diagnostics:

    python verify_device.py

Common Issues

"MPS backend out of memory":

  • MPS has memory limitations. Try reducing batch size or use CPU for large models.
  • You can force CPU usage by setting device preference in the code.

Performance Issues on MPS:

  • First run may be slower due to Metal shader compilation
  • Some operations may fall back to CPU automatically
  • Overall performance should still be significantly better than CPU-only

Dataset preparation

Necessary pre-processing:

  1. Separate voice and accompaniment with UVR (skip if no accompaniment)
  2. Cut audio input to shorter length with slicer, whisper takes input less than 30 seconds.
  3. Manually check generated audio input, remove inputs shorter than 2 seconds or with obivous noise.
  4. Adjust loudness if necessary, recommend Adobe Audiiton.
  5. Put the dataset into the dataset_raw directory following the structure below.

About

Singing Voice Conversion & Singing Voice Cloning for OSX

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 18