speech isolation in noisy recording #1871

Harlequin83 · 2025-06-10T23:29:37Z

Harlequin83
Jun 10, 2025

I retrieved old recordings on cassettes, and they are particularly noisy.
The background noise of the tape recorder is a hiss that sometimes covers the voice.

I tried many models available in UVR 5.6 Beta.
The one that gives the best results is the multi v2 bandit.
However, there are still some language elements that are inaudible.

I attempted to read the article by Karn Watcharasupat on the bandit v2, but my knowledge of English is not sufficient, and I had difficulty understanding.

It seems to me that in many cases, the multilingual trained model is more effective than the monolingual model.
However, for the French language, which concerns me in the comparison table, there is a difference of 0.4.
I don't know if this can make a difference and make audible the parts of dialogues that are at the limit of understanding.

That's why I am making the following request:
1) Is it possible to integrate the bandit v2 checkpoint_fra.ckpt model?

I must admit I tried copying it into the MDX models directory and recreating a config file xxx.yaml, filling in both .json files, but unfortunately my knowledge of AI and these model files did not allow me to make it work.
The FAO Faroese model has a significant difference of 1.1 in the comparison table, and English 0.3. I don't know if integrating these models can make a difference in separating voices and noise for these two languages.

I also looked for other models and visited the uvronline.app website, which seems to use UVR as the engine.
The resemble-enhance model appears to give better results than the multi bandit v2.
I then tried this same model on the resemble.ai website, but the result is less good than on uvronline; on the resemble site, the result is marred by artifacts and other noises.

2) Could you integrate the resemble-enhance model from uvronline.app into UVR 5.6?

Thank you in advance for considering these two proposals and perhaps even integrating them.

I think it is easier to integrate the resemble-enhance since it works on the uvronline.app site.

Best regards.

Harlequin83 · 2025-06-12T23:30:03Z

Harlequin83
Jun 12, 2025
Author

Hello,

I am replying to myself.

Regarding the first request, which was to ask for an installable French version of Bandit v2 for UVR 5.6.
I have just found the solution, which was practically under my nose, but on page 184 of the document
Instrumental, vocal & other stems separation & mix_master guide - UVR_MDX_Demucs_GSEP & others.pdf dated 16-06-2025
Thanks to Jarredou for the port for ZFTurbo inference which works for UVR5.6
I confirm that for my particular case the multi model is more efficient than the French model as Jarredou and Karn Watcharasupat (Sic) say.

In conclusion, the first request is to be cancelled.

For the second request, which is to integrate resemble-enhance from the uvronline.app site, I still have a small reservation because even if the dialogue is more detailed and understandable than the one released with the bandit v2 multi model. There are two details that bother me: the mono output of a stereo input signal (what about the left-right placement of the voices) and the output in mp3 (I hope that this is only to lighten the transfers and calculations on the uvronline.app site and that the output on UVR 5.6 if it is ported will be at least in 44.1Khz and 16bits).

Best Regards

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speech isolation in noisy recording #1871

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

speech isolation in noisy recording #1871

Uh oh!

Harlequin83 Jun 10, 2025

Replies: 1 comment

Uh oh!

Harlequin83 Jun 12, 2025 Author

Harlequin83
Jun 10, 2025

Harlequin83
Jun 12, 2025
Author