Best model for isolating dialogues in a film? #605
-
| 
         I am using VR-Architecture 1_HP_UVR and it does a good job, but I was wondering if there is another model that is better suited for isolating the vocal dialogues from a film which may have music and sound effects. I have a RTX 4080, so I can run a heavier model if needed.  | 
  
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
| 
         I'm not sure if it's the best, but I've been using this ensemble mode reccomended by @sabaasa: MDX-Net: Kim Vocal 1, UVR-MDX-NET inst 3 & UVR-MDX-NET inst main It's done a pretty good job when I've wanted to isolate dialogue from films. Full thread with more infomation here https://github.com/Anjok07/ultimatevocalremovergui/discussions/444#discussioncomment-5313230  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         I'm finding / hearing that MDX-Net models are not GPU accelerated / very slow. Out of interest is this your experience too? This is such a fantastic tool, only just discovered it. I would never have dreamed something could do what this can do, it's like magic.  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         @codebespawler @marshalleq MDX-Net models aren't utilising the GPU? Do you have any idea why that might be happening?  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         As a complete aside if you have access to the original 5.1 / 7.1 audio you can basically always grab just the voice by extracting the centre channel track on it's own as nothing else generally gets put in centre channel.  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 There's a LOT of other stuff in the center channel. Believe me.  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         I use the BandIt Plus model via https://github.com/ZFTurbo/Music-Source-Separation-Training to seperate dialogue / vocals from background music and sound effects / SFX. If you're willing to pay, there's also the Moises Pro Plan which does an even better job at separating dialogue. I bought it during a black friday sale for $150, regular price is $300. You can then feed the bandit plus and or moises pro plan output files to https://github.com/resemble-ai/resemble-enhance Input audio  | 
  
Beta Was this translation helpful? Give feedback.



I'm not sure if it's the best, but I've been using this ensemble mode reccomended by @sabaasa:
MDX-Net: Kim Vocal 1, UVR-MDX-NET inst 3 & UVR-MDX-NET inst main
Demucs: v4: htdemucs_ft
It's done a pretty good job when I've wanted to isolate dialogue from films.
Full thread with more infomation here https://github.com/Anjok07/ultimatevocalremovergui/discussions/444#discussioncomment-5313230