Anjok07 · natsusorahoshinochan-max · May 16, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,24 @@
+# macOS AppleDouble metadata files.
+._*
+
+# Local build and packaging outputs.
+.venv-build-macos/
+.venv-smoke/
+.research-artifacts/
+build/
+dist/
+*.dmg
+*.spec.bak
+
+# Downloaded model weights are runtime/user assets, not source.
+models/**/*.pth
+models/**/*.onnx
+models/**/*.ckpt
+models/**/*.th
+models/Demucs_Models/v3_v4_repo/htdemucs.yaml
+
+# Python/PyInstaller caches.
+__pycache__/
+*.pyc
+*.pyo
+*.log
diff --git a/README.md b/README.md
@@ -74,13 +74,13 @@ In order to use the Time Stretch or Change Pitch tool, you'll need Rubber Band.
 
 </details>
 
-### MacOS Installation
-- Please Note:
-    - The MacOS Sonoma mouse clicking issue has been fixed.
-    - MPS (GPU) acceleration for Mac M1 has been expanded to work with Demucs v4 and all MDX-Net models.
-    - This bundle is intended for those running macOS Big Sur and above.
-    - Application functionality for systems running macOS Catalina or lower is not guaranteed.
-    - Application functionality for older or budget Mac systems is not guaranteed.
+### MacOS Installation
+- Please Note:
+    - The MacOS Sonoma mouse clicking issue has been fixed.
+    - Apple Silicon GPU acceleration is available through MPS. Some model paths may fall back to CPU when PyTorch or the selected backend cannot run an operation on Apple GPU.
+    - This bundle is intended for those running macOS Big Sur and above.
+    - Application functionality for systems running macOS Catalina or lower is not guaranteed.
+    - Application functionality for older or budget Mac systems is not guaranteed.
     - Once everything is installed, the application may take up to 5-10 minutes to start for the first time (depending on your Macbook).
 
 - Download the UVR dmg for MacOS via one of the links below:
@@ -232,23 +232,119 @@ If you encounter issues, refer to the [GitHub Issues](https://github.com/Anjok07
 
 </details>
 
-### Other Application Notes
-- Nvidia GTX 1060 6GB is the minimum requirement for GPU conversions.
-- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
-- AMD Radeon GPU supported is limited at this time.
-   - There is currently a working branch for AMD GPU users [here](https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6-amd-gpu)
-- This application is only compatible with 64-bit platforms. 
-- This application relies on the Rubber Band library for the Time-Stretch and Pitch-Shift options.
+### Other Application Notes
+- Nvidia GTX 1060 6GB is the minimum requirement for GPU conversions.
+- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
+- Apple Silicon Macs can use Apple GPU (MPS) acceleration when running a modern PyTorch build with MPS support.
+- AMD Radeon GPU supported is limited at this time.
+   - There is currently a working branch for AMD GPU users [here](https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6-amd-gpu)
+- This application is only compatible with 64-bit platforms.
+- This application relies on the Rubber Band library for the Time-Stretch and Pitch-Shift options.
 - This application relies on FFmpeg to process non-wav audio files.
 - The application will automatically remember your settings when closed.
 - Conversion times will significantly depend on your hardware. 
 - These models are computationally intensive. 
 
-### Performance:
-- Model load times are faster.
-- Importing/exporting audio files is faster.
-
-## Troubleshooting
+### Performance:
+- Model load times are faster.
+- Importing/exporting audio files is faster.
+
+## Inference Backends
+
+UVR now uses an inference backend selector behind the existing GPU controls. The available backend modes are:
+
+- `Auto`
+- `CPU`
+- `CUDA`
+- `Apple GPU (MPS)`
+- `CoreML`
+
+`Auto` prefers CUDA first, then Apple GPU (MPS), then CoreML for ONNX models on macOS, then CPU. On macOS Apple Silicon, the settings UI shows the relevant choices: `Auto`, `Apple GPU (MPS)`, `CoreML`, and `CPU`.
+
+Backend coverage:
+
+- VR and MDXC models run through PyTorch and can use CPU, CUDA, or Apple GPU (MPS).
+- MDX ONNX models keep ONNX Runtime CUDA and CPU support. On Apple GPU (MPS), MDX ONNX models are converted through `onnx2pytorch` and run as PyTorch models. `CoreML` is available as an explicit experimental ONNX Runtime backend.
+- Demucs v3 and v4 can attempt Apple GPU (MPS). If that path fails, UVR falls back to CPU and records the fallback in the log. Demucs v1 and v2 keep the older CPU/CUDA behavior by default.
+
+The conversion log includes the selected backend, runner, model load time, and fallback status. `PYTORCH_ENABLE_MPS_FALLBACK=1` is set by default for compatibility, but the primary MPS paths are expected to work without hiding unsupported device transfers.
+
+## Developer Smoke Tests
+
+The real-model smoke harness lives in `tools/smoke_inference.py`. It intentionally avoids importing `UVR.py` because `UVR.py` starts the tkinter mainloop on import. The harness drives the separator classes directly, writes results to `.research-artifacts/smoke/smoke_results.jsonl`, and keeps generated audio under `.research-artifacts/smoke/`.
+
+Recommended Apple Silicon setup:
+
+```bash
+uv python install 3.10
+uv venv .venv-smoke --python 3.10
+.venv-smoke/bin/python -m ensurepip --upgrade
+.venv-smoke/bin/python -m pip install playsound==1.2.2
+SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True .venv-smoke/bin/python -m pip install -r requirements.txt
+.venv-smoke/bin/python -m pip uninstall -y PySoundFile
+.venv-smoke/bin/python -m pip install soundfile==0.13.1 onnx2pytorch
+```
+
+Run the blocking smoke flow:
+
+```bash
+.venv-smoke/bin/python tools/smoke_inference.py prepare
+.venv-smoke/bin/python tools/smoke_inference.py doctor --append
+.venv-smoke/bin/python tools/smoke_inference.py probe --append
+.venv-smoke/bin/python tools/smoke_inference.py run --phase blocking --continue-on-error --append
+.venv-smoke/bin/python tools/smoke_inference.py summary --append
+```
+
+The blocking phase covers backend probing plus single-model real inference for VR, MDX ONNX, MDX ONNX CoreML, MDXC, and Demucs v4. Additional phases are available for release checks:
+
+- `extended`: secondary model, vocal splitter, ensemble, FLAC output, and MP3 output.
+- `backend_modes`: explicit Apple GPU (MPS) cases and Demucs v4 CPU comparison.
+- `gpu_disabled`: verifies that disabling GPU forces CPU even when Auto or MPS is selected.
+- `demucs_legacy`: verifies Demucs v1/v2 backend policy without requiring legacy weights.
+- `format_inputs`: FLAC input, MP3 input, and mono input.
+- `long_audio`: 60-second MDX MPS and 180-second VR CPU stability checks.
+- `all`: blocking, extended, backend mode, GPU-disabled, Demucs legacy, format-input, and long-audio checks.
+
+Use `summary` after a run to print the latest pass/fail state and CPU-vs-MPS timing pairs.
+
+The `prepare` command downloads the minimal real model set into the app's existing model folders and generates an 8-second stereo WAV input. Do not commit `.venv-smoke`, `.research-artifacts`, downloaded model weights, `__pycache__`, or AppleDouble `._*` files.
+
+### MDX23C Apple GPU Functional Check
+
+A full-length MDX23C functional check was run on Apple Silicon with the packaged app resources:
+
+- Model: `dist/Ultimate Vocal Remover.app/Contents/Resources/models/MDX_Net_Models/MDX23C-8KFFT-InstVoc_HQ.ckpt`
+- Input: `example_data/kamiina_ep01.wav`
+- Config: `model_2_stem_full_band_8k.yaml`
+- Backend mode: `Auto`
+- Actual backend: `Apple GPU (MPS)`
+- Torch device: `mps`
+- Fallback: `false`
+- Result: `passed`
+- Elapsed time: `2070.068s`
+
+The model hash did not match an entry in the bundled `model_data.json`, so the GUI does not automatically infer its MDX23C config from metadata. For this file, use the 8KFFT config `model_2_stem_full_band_8k.yaml`.
+
+The effective MDX23C parameters for that check were:
+
+- Segment size: `256`
+- Segment default: `true`, so `inference.dim_t=256` from the config was used.
+- Overlap: `8`
+- Batch size: `1`
+- `n_fft`: `8192`
+- `dim_f`: `4096`
+- `hop_length`: `1024`
+- Runtime chunk size: `1024 * (256 - 1) = 261120`
+- Runtime hop size: `261120 // 8 = 32640`
+- Instruments: `Vocals`, `Instrumental`
+- Output format: WAV `PCM_16`
+- Output normalization: `false`
+- Pitch shift: `0`
+- Match frequency cutoff: `true`, but it is not used when pitch shift is zero.
+
+The generated verification files live under `.research-artifacts/functional-mdxc-app/` and are intentionally not committed.
+
+## Troubleshooting
 
 ### Common Issues
 

diff --git a/UVR.py b/UVR.py
@@ -9,6 +9,7 @@
 import math
 import natsort
 import os
+os.environ.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1")
 import pickle
 import psutil
 from pyglet import font as pyglet_font
@@ -44,6 +45,7 @@
 from kthread import KThread
 from lib_v5 import spec_utils
 from pathlib  import Path
+from inference_backend import BACKEND_MODE_OPTIONS as INFERENCE_BACKEND_MODE_OPTIONS
 from separate import (
     SeperateDemucs, SeperateMDX, SeperateMDXC, SeperateVR,  # Model-related
     save_format, clear_gpu_cache,  # Utility functions
@@ -345,6 +347,7 @@ def __init__(self, model_name: str,
         self.deverb_vocal_opt = DEVERB_MAPPER[root.deverb_vocal_opt_var.get()]
         self.is_denoise_model = True if root.denoise_option_var.get() == DENOISE_M and os.path.isfile(DENOISER_MODEL_PATH) else False
         self.is_gpu_conversion = 0 if root.is_gpu_conversion_var.get() else -1
+        self.backend_mode = root.backend_mode_var.get()
         self.is_normalization = root.is_normalization_var.get()#
         self.is_use_opencl = False#True if is_opencl_only else root.is_use_opencl_var.get()
         self.is_primary_stem_only = root.is_primary_stem_only_var.get()
@@ -3174,6 +3177,8 @@ def fill_gpu_list(self):
                 self.cuda_device_list = [f"{torch.cuda.get_device_properties(i).name}:{i}" for i in range(torch.cuda.device_count())]
                 self.cuda_device_list.insert(0, DEFAULT)
                 #print(self.cuda_device_list)
+            elif mps_available:
+                self.cuda_device_list = [DEFAULT]
 
             # if directml_available:
             #     self.opencl_list = [f"{torch_directml.device_name(i)}:{i}" for i in range(torch_directml.device_count())]
@@ -3338,6 +3343,19 @@ def set_vars_for_sample_mode(event):
         #if not is_choose_arch:
         self.vocal_splitter_Button_opt(settings_menu, settings_menu_format_Frame, width=SETTINGS_BUT_WIDTH-2, pady=MENU_PADDING_4)
 
+        if self.is_gpu_available:
+            backend_Label = self.menu_title_LABEL_SET(settings_menu_format_Frame, BACKEND_MODE_TEXT)
+            backend_Label.grid(pady=MENU_PADDING_2)
+
+            backend_options = (
+                (BACKEND_AUTO, BACKEND_MPS, BACKEND_COREML, BACKEND_CPU)
+                if is_macos else
+                INFERENCE_BACKEND_MODE_OPTIONS
+            )
+            backend_Option = ComboBoxMenu(settings_menu_format_Frame, textvariable=self.backend_mode_var, values=backend_options, width=GEN_SETTINGS_WIDTH+1)
+            backend_Option.grid(padx=20,pady=MENU_PADDING_1)
+            self.help_hints(backend_Label, text=BACKEND_MODE_HELP)
+
         if not is_macos and self.is_gpu_available:
             gpu_list_options = lambda:self.loop_gpu_list(device_set_Option, 'gpudevice', self.cuda_device_list)#self.opencl_list if is_opencl_only or self.is_use_opencl_var.get() else self.cuda_device_list)
             device_set_Label = self.menu_title_LABEL_SET(settings_menu_format_Frame, CUDA_NUM_TEXT)
@@ -6823,7 +6841,8 @@ def load_saved_vars(self, data):
         self.save_format_var = tk.StringVar(value=data['save_format'])
         self.wav_type_set_var = tk.StringVar(value=data['wav_type_set'])#
         self.device_set_var = tk.StringVar(value=data['device_set'])#
-        self.user_code_var = tk.StringVar(value=data['user_code']) 
+        self.backend_mode_var = tk.StringVar(value=data['backend_mode'])
+        self.user_code_var = tk.StringVar(value=data['user_code'])
         self.is_gpu_conversion_var = tk.BooleanVar(value=data['is_gpu_conversion'])
         self.is_primary_stem_only_var = tk.BooleanVar(value=data['is_primary_stem_only'])
         self.is_secondary_stem_only_var = tk.BooleanVar(value=data['is_secondary_stem_only'])
@@ -6967,6 +6986,7 @@ def load_saved_settings(self, loaded_setting: dict, process_method=None, is_defa
             self.save_format_var.set(loaded_setting['save_format'])
             self.wav_type_set_var.set(loaded_setting['wav_type_set'])#
             self.device_set_var.set(loaded_setting['device_set'])#
+            self.backend_mode_var.set(loaded_setting['backend_mode'])
             self.user_code_var.set(loaded_setting['user_code'])
             self.phase_option_var.set(loaded_setting['phase_option'])#
             self.phase_shifts_var.set(loaded_setting['phase_shifts'])#
@@ -6983,6 +7003,7 @@ def load_saved_settings(self, loaded_setting: dict, process_method=None, is_defa
             self.DualBatch_inputPaths = []
 
         self.is_gpu_conversion_var.set(loaded_setting['is_gpu_conversion'])
+        self.backend_mode_var.set(loaded_setting['backend_mode'])
         self.is_normalization_var.set(loaded_setting['is_normalization'])#
         self.is_use_opencl_var.set(False)#True if is_opencl_only else loaded_setting['is_use_opencl'])#
         self.is_wav_ensemble_var.set(loaded_setting['is_wav_ensemble'])#
@@ -7087,6 +7108,7 @@ def save_values(self, app_close=True, is_restart=False, is_auto_save=False):
             'pitch_rate': self.pitch_rate_var.get(),#
             'is_time_correction': self.is_time_correction_var.get(),#
             'is_gpu_conversion': self.is_gpu_conversion_var.get(),
+            'backend_mode': self.backend_mode_var.get(),
             'is_primary_stem_only': self.is_primary_stem_only_var.get(),
             'is_secondary_stem_only': self.is_secondary_stem_only_var.get(),
             'is_testing_audio': self.is_testing_audio_var.get(),#

diff --git a/demucs/hdemucs.py b/demucs/hdemucs.py
@@ -770,27 +770,23 @@ def forward(self, mix):
         x = x.view(B, S, -1, Fq, T)
         x = x * std[:, None] + mean[:, None]
 
-        # to cpu as non-cuda GPUs don't support complex numbers
-        # demucs issue #435 ##432
-        # NOTE: in this case z already is on cpu
-        # TODO: remove this when mps supports complex numbers
-
-        device_type = x.device.type
-        device_load = f"{device_type}:{x.device.index}" if not device_type == 'mps' else device_type
-        x_is_other_gpu = not device_type in ["cuda", "cpu"]
-
-        if x_is_other_gpu:
-            x = x.cpu()
-
-        zout = self._mask(z, x)
-        x = self._ispec(zout, length)
-
-        # back to other device
-        if x_is_other_gpu:
-            x = x.to(device_load)
+        device = x.device
+        device_type = device.type
+        try:
+            zout = self._mask(z, x)
+            x = self._ispec(zout, length)
+        except Exception:
+            if device_type in ["cuda", "cpu"]:
+                raise
+            if self.hybrid:
+                xt = xt.cpu()
+                meant = meant.cpu()
+                stdt = stdt.cpu()
+            zout = self._mask(z.cpu(), x.cpu())
+            x = self._ispec(zout, length)
 
         if self.hybrid:
             xt = xt.view(B, S, -1, length)
             xt = xt * stdt[:, None] + meant[:, None]
             x = xt + x
-        return x
+        return x.to(device) if x.device != device else x
diff --git a/demucs/htdemucs.py b/demucs/htdemucs.py
@@ -625,30 +625,31 @@ def forward(self, mix):
         x = x.view(B, S, -1, Fq, T)
         x = x * std[:, None] + mean[:, None]
 
-        # to cpu as non-cuda GPUs don't support complex numbers
-        # demucs issue #435 ##432
-        # NOTE: in this case z already is on cpu
-        # TODO: remove this when mps supports complex numbers
-
-        device_type = x.device.type
-        device_load = f"{device_type}:{x.device.index}" if not device_type == 'mps' else device_type
-        x_is_other_gpu = not device_type in ["cuda", "cpu"]
-
-        if x_is_other_gpu:
-            x = x.cpu()
-
-        zout = self._mask(z, x)
-        if self.use_train_segment:
-            if self.training:
+        device = x.device
+        device_type = device.type
+        try:
+            zout = self._mask(z, x)
+            if self.use_train_segment:
+                if self.training:
+                    x = self._ispec(zout, length)
+                else:
+                    x = self._ispec(zout, training_length)
+            else:
                 x = self._ispec(zout, length)
+        except Exception:
+            if device_type in ["cuda", "cpu"]:
+                raise
+            xt = xt.cpu()
+            meant = meant.cpu()
+            stdt = stdt.cpu()
+            zout = self._mask(z.cpu(), x.cpu())
+            if self.use_train_segment:
+                if self.training:
+                    x = self._ispec(zout, length)
+                else:
+                    x = self._ispec(zout, training_length)
             else:
-                x = self._ispec(zout, training_length)
-        else:
-            x = self._ispec(zout, length)
-
-        # back to other device
-        if x_is_other_gpu:
-            x = x.to(device_load)
+                x = self._ispec(zout, length)
 
         if self.use_train_segment:
             if self.training:
@@ -659,6 +660,7 @@ def forward(self, mix):
             xt = xt.view(B, S, -1, length)
         xt = xt * stdt[:, None] + meant[:, None]
         x = xt + x
+        x = x.to(device) if x.device != device else x
         if length_pre_pad:
             x = x[..., :length_pre_pad]
         return x