speech isolation in noisy recording #1871
Replies: 1 comment
-
| 
         Hello, I am replying to myself. Regarding the first request, which was to ask for an installable French version of Bandit v2 for UVR 5.6. In conclusion, the first request is to be cancelled. For the second request, which is to integrate resemble-enhance from the uvronline.app site, I still have a small reservation because even if the dialogue is more detailed and understandable than the one released with the bandit v2 multi model. There are two details that bother me: the mono output of a stereo input signal (what about the left-right placement of the voices) and the output in mp3 (I hope that this is only to lighten the transfers and calculations on the uvronline.app site and that the output on UVR 5.6 if it is ported will be at least in 44.1Khz and 16bits). Best Regards  | 
  
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I retrieved old recordings on cassettes, and they are particularly noisy.
The background noise of the tape recorder is a hiss that sometimes covers the voice.
I tried many models available in UVR 5.6 Beta.
The one that gives the best results is the multi v2 bandit.
However, there are still some language elements that are inaudible.
I attempted to read the article by Karn Watcharasupat on the bandit v2, but my knowledge of English is not sufficient, and I had difficulty understanding.
It seems to me that in many cases, the multilingual trained model is more effective than the monolingual model.
However, for the French language, which concerns me in the comparison table, there is a difference of 0.4.
I don't know if this can make a difference and make audible the parts of dialogues that are at the limit of understanding.
That's why I am making the following request:
1) Is it possible to integrate the bandit v2 checkpoint_fra.ckpt model?
I must admit I tried copying it into the MDX models directory and recreating a config file xxx.yaml, filling in both .json files, but unfortunately my knowledge of AI and these model files did not allow me to make it work.
The FAO Faroese model has a significant difference of 1.1 in the comparison table, and English 0.3. I don't know if integrating these models can make a difference in separating voices and noise for these two languages.
I also looked for other models and visited the uvronline.app website, which seems to use UVR as the engine.
The resemble-enhance model appears to give better results than the multi bandit v2.
I then tried this same model on the resemble.ai website, but the result is less good than on uvronline; on the resemble site, the result is marred by artifacts and other noises.
2) Could you integrate the resemble-enhance model from uvronline.app into UVR 5.6?
Thank you in advance for considering these two proposals and perhaps even integrating them.
I think it is easier to integrate the resemble-enhance since it works on the uvronline.app site.
Best regards.
Beta Was this translation helpful? Give feedback.
All reactions