Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# macOS AppleDouble metadata files.
._*

# Local build and packaging outputs.
.venv-build-macos/
.venv-smoke/
.research-artifacts/
build/
dist/
*.dmg
*.spec.bak

# Downloaded model weights are runtime/user assets, not source.
models/**/*.pth
models/**/*.onnx
models/**/*.ckpt
models/**/*.th
models/Demucs_Models/v3_v4_repo/htdemucs.yaml

# Python/PyInstaller caches.
__pycache__/
*.pyc
*.pyo
*.log
134 changes: 115 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ In order to use the Time Stretch or Change Pitch tool, you'll need Rubber Band.

</details>

### MacOS Installation
- Please Note:
- The MacOS Sonoma mouse clicking issue has been fixed.
- MPS (GPU) acceleration for Mac M1 has been expanded to work with Demucs v4 and all MDX-Net models.
- This bundle is intended for those running macOS Big Sur and above.
- Application functionality for systems running macOS Catalina or lower is not guaranteed.
- Application functionality for older or budget Mac systems is not guaranteed.
### MacOS Installation
- Please Note:
- The MacOS Sonoma mouse clicking issue has been fixed.
- Apple Silicon GPU acceleration is available through MPS. Some model paths may fall back to CPU when PyTorch or the selected backend cannot run an operation on Apple GPU.
- This bundle is intended for those running macOS Big Sur and above.
- Application functionality for systems running macOS Catalina or lower is not guaranteed.
- Application functionality for older or budget Mac systems is not guaranteed.
- Once everything is installed, the application may take up to 5-10 minutes to start for the first time (depending on your Macbook).

- Download the UVR dmg for MacOS via one of the links below:
Expand Down Expand Up @@ -232,23 +232,119 @@ If you encounter issues, refer to the [GitHub Issues](https://github.com/Anjok07

</details>

### Other Application Notes
- Nvidia GTX 1060 6GB is the minimum requirement for GPU conversions.
- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
- AMD Radeon GPU supported is limited at this time.
- There is currently a working branch for AMD GPU users [here](https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6-amd-gpu)
- This application is only compatible with 64-bit platforms.
- This application relies on the Rubber Band library for the Time-Stretch and Pitch-Shift options.
### Other Application Notes
- Nvidia GTX 1060 6GB is the minimum requirement for GPU conversions.
- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
- Apple Silicon Macs can use Apple GPU (MPS) acceleration when running a modern PyTorch build with MPS support.
- AMD Radeon GPU supported is limited at this time.
- There is currently a working branch for AMD GPU users [here](https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6-amd-gpu)
- This application is only compatible with 64-bit platforms.
- This application relies on the Rubber Band library for the Time-Stretch and Pitch-Shift options.
- This application relies on FFmpeg to process non-wav audio files.
- The application will automatically remember your settings when closed.
- Conversion times will significantly depend on your hardware.
- These models are computationally intensive.

### Performance:
- Model load times are faster.
- Importing/exporting audio files is faster.

## Troubleshooting
### Performance:
- Model load times are faster.
- Importing/exporting audio files is faster.

## Inference Backends

UVR now uses an inference backend selector behind the existing GPU controls. The available backend modes are:

- `Auto`
- `CPU`
- `CUDA`
- `Apple GPU (MPS)`
- `CoreML`

`Auto` prefers CUDA first, then Apple GPU (MPS), then CoreML for ONNX models on macOS, then CPU. On macOS Apple Silicon, the settings UI shows the relevant choices: `Auto`, `Apple GPU (MPS)`, `CoreML`, and `CPU`.

Backend coverage:

- VR and MDXC models run through PyTorch and can use CPU, CUDA, or Apple GPU (MPS).
- MDX ONNX models keep ONNX Runtime CUDA and CPU support. On Apple GPU (MPS), MDX ONNX models are converted through `onnx2pytorch` and run as PyTorch models. `CoreML` is available as an explicit experimental ONNX Runtime backend.
- Demucs v3 and v4 can attempt Apple GPU (MPS). If that path fails, UVR falls back to CPU and records the fallback in the log. Demucs v1 and v2 keep the older CPU/CUDA behavior by default.

The conversion log includes the selected backend, runner, model load time, and fallback status. `PYTORCH_ENABLE_MPS_FALLBACK=1` is set by default for compatibility, but the primary MPS paths are expected to work without hiding unsupported device transfers.

## Developer Smoke Tests

The real-model smoke harness lives in `tools/smoke_inference.py`. It intentionally avoids importing `UVR.py` because `UVR.py` starts the tkinter mainloop on import. The harness drives the separator classes directly, writes results to `.research-artifacts/smoke/smoke_results.jsonl`, and keeps generated audio under `.research-artifacts/smoke/`.

Recommended Apple Silicon setup:

```bash
uv python install 3.10
uv venv .venv-smoke --python 3.10
.venv-smoke/bin/python -m ensurepip --upgrade
.venv-smoke/bin/python -m pip install playsound==1.2.2
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True .venv-smoke/bin/python -m pip install -r requirements.txt
.venv-smoke/bin/python -m pip uninstall -y PySoundFile
.venv-smoke/bin/python -m pip install soundfile==0.13.1 onnx2pytorch
```

Run the blocking smoke flow:

```bash
.venv-smoke/bin/python tools/smoke_inference.py prepare
.venv-smoke/bin/python tools/smoke_inference.py doctor --append
.venv-smoke/bin/python tools/smoke_inference.py probe --append
.venv-smoke/bin/python tools/smoke_inference.py run --phase blocking --continue-on-error --append
.venv-smoke/bin/python tools/smoke_inference.py summary --append
```

The blocking phase covers backend probing plus single-model real inference for VR, MDX ONNX, MDX ONNX CoreML, MDXC, and Demucs v4. Additional phases are available for release checks:

- `extended`: secondary model, vocal splitter, ensemble, FLAC output, and MP3 output.
- `backend_modes`: explicit Apple GPU (MPS) cases and Demucs v4 CPU comparison.
- `gpu_disabled`: verifies that disabling GPU forces CPU even when Auto or MPS is selected.
- `demucs_legacy`: verifies Demucs v1/v2 backend policy without requiring legacy weights.
- `format_inputs`: FLAC input, MP3 input, and mono input.
- `long_audio`: 60-second MDX MPS and 180-second VR CPU stability checks.
- `all`: blocking, extended, backend mode, GPU-disabled, Demucs legacy, format-input, and long-audio checks.

Use `summary` after a run to print the latest pass/fail state and CPU-vs-MPS timing pairs.

The `prepare` command downloads the minimal real model set into the app's existing model folders and generates an 8-second stereo WAV input. Do not commit `.venv-smoke`, `.research-artifacts`, downloaded model weights, `__pycache__`, or AppleDouble `._*` files.

### MDX23C Apple GPU Functional Check

A full-length MDX23C functional check was run on Apple Silicon with the packaged app resources:

- Model: `dist/Ultimate Vocal Remover.app/Contents/Resources/models/MDX_Net_Models/MDX23C-8KFFT-InstVoc_HQ.ckpt`
- Input: `example_data/kamiina_ep01.wav`
- Config: `model_2_stem_full_band_8k.yaml`
- Backend mode: `Auto`
- Actual backend: `Apple GPU (MPS)`
- Torch device: `mps`
- Fallback: `false`
- Result: `passed`
- Elapsed time: `2070.068s`

The model hash did not match an entry in the bundled `model_data.json`, so the GUI does not automatically infer its MDX23C config from metadata. For this file, use the 8KFFT config `model_2_stem_full_band_8k.yaml`.

The effective MDX23C parameters for that check were:

- Segment size: `256`
- Segment default: `true`, so `inference.dim_t=256` from the config was used.
- Overlap: `8`
- Batch size: `1`
- `n_fft`: `8192`
- `dim_f`: `4096`
- `hop_length`: `1024`
- Runtime chunk size: `1024 * (256 - 1) = 261120`
- Runtime hop size: `261120 // 8 = 32640`
- Instruments: `Vocals`, `Instrumental`
- Output format: WAV `PCM_16`
- Output normalization: `false`
- Pitch shift: `0`
- Match frequency cutoff: `true`, but it is not used when pitch shift is zero.

The generated verification files live under `.research-artifacts/functional-mdxc-app/` and are intentionally not committed.

## Troubleshooting

### Common Issues

Expand Down
24 changes: 23 additions & 1 deletion UVR.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import math
import natsort
import os
os.environ.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1")
import pickle
import psutil
from pyglet import font as pyglet_font
Expand Down Expand Up @@ -44,6 +45,7 @@
from kthread import KThread
from lib_v5 import spec_utils
from pathlib import Path
from inference_backend import BACKEND_MODE_OPTIONS as INFERENCE_BACKEND_MODE_OPTIONS
from separate import (
SeperateDemucs, SeperateMDX, SeperateMDXC, SeperateVR, # Model-related
save_format, clear_gpu_cache, # Utility functions
Expand Down Expand Up @@ -345,6 +347,7 @@ def __init__(self, model_name: str,
self.deverb_vocal_opt = DEVERB_MAPPER[root.deverb_vocal_opt_var.get()]
self.is_denoise_model = True if root.denoise_option_var.get() == DENOISE_M and os.path.isfile(DENOISER_MODEL_PATH) else False
self.is_gpu_conversion = 0 if root.is_gpu_conversion_var.get() else -1
self.backend_mode = root.backend_mode_var.get()
self.is_normalization = root.is_normalization_var.get()#
self.is_use_opencl = False#True if is_opencl_only else root.is_use_opencl_var.get()
self.is_primary_stem_only = root.is_primary_stem_only_var.get()
Expand Down Expand Up @@ -3174,6 +3177,8 @@ def fill_gpu_list(self):
self.cuda_device_list = [f"{torch.cuda.get_device_properties(i).name}:{i}" for i in range(torch.cuda.device_count())]
self.cuda_device_list.insert(0, DEFAULT)
#print(self.cuda_device_list)
elif mps_available:
self.cuda_device_list = [DEFAULT]

# if directml_available:
# self.opencl_list = [f"{torch_directml.device_name(i)}:{i}" for i in range(torch_directml.device_count())]
Expand Down Expand Up @@ -3338,6 +3343,19 @@ def set_vars_for_sample_mode(event):
#if not is_choose_arch:
self.vocal_splitter_Button_opt(settings_menu, settings_menu_format_Frame, width=SETTINGS_BUT_WIDTH-2, pady=MENU_PADDING_4)

if self.is_gpu_available:
backend_Label = self.menu_title_LABEL_SET(settings_menu_format_Frame, BACKEND_MODE_TEXT)
backend_Label.grid(pady=MENU_PADDING_2)

backend_options = (
(BACKEND_AUTO, BACKEND_MPS, BACKEND_COREML, BACKEND_CPU)
if is_macos else
INFERENCE_BACKEND_MODE_OPTIONS
)
backend_Option = ComboBoxMenu(settings_menu_format_Frame, textvariable=self.backend_mode_var, values=backend_options, width=GEN_SETTINGS_WIDTH+1)
backend_Option.grid(padx=20,pady=MENU_PADDING_1)
self.help_hints(backend_Label, text=BACKEND_MODE_HELP)

if not is_macos and self.is_gpu_available:
gpu_list_options = lambda:self.loop_gpu_list(device_set_Option, 'gpudevice', self.cuda_device_list)#self.opencl_list if is_opencl_only or self.is_use_opencl_var.get() else self.cuda_device_list)
device_set_Label = self.menu_title_LABEL_SET(settings_menu_format_Frame, CUDA_NUM_TEXT)
Expand Down Expand Up @@ -6823,7 +6841,8 @@ def load_saved_vars(self, data):
self.save_format_var = tk.StringVar(value=data['save_format'])
self.wav_type_set_var = tk.StringVar(value=data['wav_type_set'])#
self.device_set_var = tk.StringVar(value=data['device_set'])#
self.user_code_var = tk.StringVar(value=data['user_code'])
self.backend_mode_var = tk.StringVar(value=data['backend_mode'])
self.user_code_var = tk.StringVar(value=data['user_code'])
self.is_gpu_conversion_var = tk.BooleanVar(value=data['is_gpu_conversion'])
self.is_primary_stem_only_var = tk.BooleanVar(value=data['is_primary_stem_only'])
self.is_secondary_stem_only_var = tk.BooleanVar(value=data['is_secondary_stem_only'])
Expand Down Expand Up @@ -6967,6 +6986,7 @@ def load_saved_settings(self, loaded_setting: dict, process_method=None, is_defa
self.save_format_var.set(loaded_setting['save_format'])
self.wav_type_set_var.set(loaded_setting['wav_type_set'])#
self.device_set_var.set(loaded_setting['device_set'])#
self.backend_mode_var.set(loaded_setting['backend_mode'])
self.user_code_var.set(loaded_setting['user_code'])
self.phase_option_var.set(loaded_setting['phase_option'])#
self.phase_shifts_var.set(loaded_setting['phase_shifts'])#
Expand All @@ -6983,6 +7003,7 @@ def load_saved_settings(self, loaded_setting: dict, process_method=None, is_defa
self.DualBatch_inputPaths = []

self.is_gpu_conversion_var.set(loaded_setting['is_gpu_conversion'])
self.backend_mode_var.set(loaded_setting['backend_mode'])
self.is_normalization_var.set(loaded_setting['is_normalization'])#
self.is_use_opencl_var.set(False)#True if is_opencl_only else loaded_setting['is_use_opencl'])#
self.is_wav_ensemble_var.set(loaded_setting['is_wav_ensemble'])#
Expand Down Expand Up @@ -7087,6 +7108,7 @@ def save_values(self, app_close=True, is_restart=False, is_auto_save=False):
'pitch_rate': self.pitch_rate_var.get(),#
'is_time_correction': self.is_time_correction_var.get(),#
'is_gpu_conversion': self.is_gpu_conversion_var.get(),
'backend_mode': self.backend_mode_var.get(),
'is_primary_stem_only': self.is_primary_stem_only_var.get(),
'is_secondary_stem_only': self.is_secondary_stem_only_var.get(),
'is_testing_audio': self.is_testing_audio_var.get(),#
Expand Down
34 changes: 15 additions & 19 deletions demucs/hdemucs.py
Original file line number Diff line number Diff line change
Expand Up @@ -770,27 +770,23 @@ def forward(self, mix):
x = x.view(B, S, -1, Fq, T)
x = x * std[:, None] + mean[:, None]

# to cpu as non-cuda GPUs don't support complex numbers
# demucs issue #435 ##432
# NOTE: in this case z already is on cpu
# TODO: remove this when mps supports complex numbers

device_type = x.device.type
device_load = f"{device_type}:{x.device.index}" if not device_type == 'mps' else device_type
x_is_other_gpu = not device_type in ["cuda", "cpu"]

if x_is_other_gpu:
x = x.cpu()

zout = self._mask(z, x)
x = self._ispec(zout, length)

# back to other device
if x_is_other_gpu:
x = x.to(device_load)
device = x.device
device_type = device.type
try:
zout = self._mask(z, x)
x = self._ispec(zout, length)
except Exception:
if device_type in ["cuda", "cpu"]:
raise
if self.hybrid:
xt = xt.cpu()
meant = meant.cpu()
stdt = stdt.cpu()
zout = self._mask(z.cpu(), x.cpu())
x = self._ispec(zout, length)

if self.hybrid:
xt = xt.view(B, S, -1, length)
xt = xt * stdt[:, None] + meant[:, None]
x = xt + x
return x
return x.to(device) if x.device != device else x
46 changes: 24 additions & 22 deletions demucs/htdemucs.py
Original file line number Diff line number Diff line change
Expand Up @@ -625,30 +625,31 @@ def forward(self, mix):
x = x.view(B, S, -1, Fq, T)
x = x * std[:, None] + mean[:, None]

# to cpu as non-cuda GPUs don't support complex numbers
# demucs issue #435 ##432
# NOTE: in this case z already is on cpu
# TODO: remove this when mps supports complex numbers

device_type = x.device.type
device_load = f"{device_type}:{x.device.index}" if not device_type == 'mps' else device_type
x_is_other_gpu = not device_type in ["cuda", "cpu"]

if x_is_other_gpu:
x = x.cpu()

zout = self._mask(z, x)
if self.use_train_segment:
if self.training:
device = x.device
device_type = device.type
try:
zout = self._mask(z, x)
if self.use_train_segment:
if self.training:
x = self._ispec(zout, length)
else:
x = self._ispec(zout, training_length)
else:
x = self._ispec(zout, length)
except Exception:
if device_type in ["cuda", "cpu"]:
raise
xt = xt.cpu()
meant = meant.cpu()
stdt = stdt.cpu()
zout = self._mask(z.cpu(), x.cpu())
if self.use_train_segment:
if self.training:
x = self._ispec(zout, length)
else:
x = self._ispec(zout, training_length)
else:
x = self._ispec(zout, training_length)
else:
x = self._ispec(zout, length)

# back to other device
if x_is_other_gpu:
x = x.to(device_load)
x = self._ispec(zout, length)

if self.use_train_segment:
if self.training:
Expand All @@ -659,6 +660,7 @@ def forward(self, mix):
xt = xt.view(B, S, -1, length)
xt = xt * stdt[:, None] + meant[:, None]
x = xt + x
x = x.to(device) if x.device != device else x
if length_pre_pad:
x = x[..., :length_pre_pad]
return x
Loading