Training on non-vocal data, for noise removal? #554

Landeplage · 2023-05-13T11:09:58Z

Landeplage
May 13, 2023

I've been testing UVR for a few days, and it's really amazing what the MDX-Net process can do.

I'm a sound designer and recordist, and so immediately I got the idea to use this method for non-musical audio as well. Is it possible to train it to remove noise from recordings? I'm thinking sounds like birds, preamp noise, traffic, talking. If the issue would be gathering data for this, I know about 2700 recordists who could be interested in lending their recordings to use.

I'm not sure what the training data requirements are, but I assume that one would need both a mixed recording along with already separated recordings?

A similar tool exists for removing the sound of birds from recordings:
https://www.boomlibrary.com/sound-effects/debird/

iZotope RX is another industry-standard noise removal tool, although I'm not sure whether they use machine learning for their algorithms, but they definitely moving in that direction:
https://www.izotope.com/en/products/rx.html

g-frega · 2023-05-15T20:24:08Z

g-frega
May 15, 2023

Very interested about this too, I am a sound designer as well

0 replies

filmgeezer · 2023-06-03T09:44:41Z

filmgeezer
Jun 3, 2023

+1 from me

0 replies

fufune · 2023-07-25T19:16:38Z

fufune
Jul 25, 2023

+1 for me

0 replies

KimberleyJensen · 2023-08-01T17:23:28Z

KimberleyJensen
Aug 1, 2023

@Landeplage @g-frega Yes this is possible. To create a dataset for separating bird sounds you need two sounds, first you need the bird sound(s) with zero to as little background noise as possible. Second you need sounds that you want to separate the bird sounds from, the second sound can not contain bird sounds.

I would be interested in collaborating on something like this, I can train the models if someone can provide the data, I have no idea how well this would perform but if anyone is interested my discord username is kimberleyjsn or email [email protected]

0 replies

aufr33 · 2024-03-23T08:56:34Z

aufr33
Mar 23, 2024
Collaborator

Hi, I have already created exactly such a model. This model will be added soon in one of the next UVR GUI updates. In the meantime, you can use it on my UVR Online website. You can test the premium version for 7 days for free: https://www.patreon.com/uvronline

4 replies

MonolithFoundation Dec 5, 2024

Can u provide the model open download link? is it opensource?

aufr33 Dec 5, 2024
Collaborator

https://huggingface.co/jarredou/aufr33_MelBand_Denoise/tree/main

MonolithFoundation Dec 6, 2024

Will this able to solve laughing background noise? @aufr33

Landeplage Jun 24, 2025
Author

@aufr33 Any updates on this? Would be interested in testing.

Harlequin83 · 2025-06-27T10:58:37Z

Harlequin83
Jun 27, 2025

Hello all

Izotope RX 11 is based on the Demucs model for stem separation.
It is not possible to know if they use public models like those used in UVR and many sites.

Steinberg Spectralayers also uses Demucs, but the developer indicates that he trained the models to achieve better results than the original Demucs and to separate more stems.
The developer is also the one who developed TorchStudio, a project aimed at simplifying and speeding up model training by providing an ecosystem for this, such as a result comparison tool for trained models. His name is Robin Lobel.

Training models seems complex to me in many ways, due to my limited mathematical, acoustic, english and AI knowledge. It also appears that to obtain good results, you need powerful hardware; a RTX 3090 seems to be a minimum, or you have to pay for AI cloud computing rentals. I may be wrong, but only a specialist like Aufr could confirm my hypothesis.

It would indeed be great to be able to train models ourselves to remove noise from vocals (speech).
Currently, I face this difficulty with old audio cassettes I want to restore, but the person being interviewed sometimes speaks very quietly, with a hiss level on the cassette that is sometimes higher than the voice level. The Bandit V2 multi model works quite well, but in this case, it is a bit too aggressive.
I hoped that the French version of Bandit V2 would improve the situation since the interview is in French, but that is not the case.

I tried the Gabox Fv7z model, which gives the best results for my personal case, but there are still some noises that none of the other models can remove without deleting vocal elements. Even Spectralayers 11 separator cannot do it (plus, the demo version is ending).

It’s clear that if I had the possibility to train a model closer to the conditions I encounter, that would be really great.

For that, given my limited knowledge, I would need a step-by-step procedure and to know if it’s possible with an RTX 2070?

Happy development and training to everyone.

The work done by Anjok, Aufr, Jarredou, and many others is truly remarkable. I take this opportunity to thank them for their time and for sharing their work.

0 replies

godzfire · 2025-07-01T21:36:32Z

godzfire
Jul 1, 2025

I'm from MVsep and discovered the denoise model by aufr33 there. I can't put into words how amazing it is and how much it's helped me with audio restoration since it was introduced.

I'm wondering if it's possible for an updated training data to be uploaded for it? The last update was about a year ago now. It's great, but there's still areas for improvement.

It would also be amazing to train other denoising options for other situations like heavy hiss, different kinds of hiss, etc.

0 replies

Training on non-vocal data, for noise removal? #554

Uh oh!

Replies: 7 comments · 4 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aufr33 Mar 23, 2024 Collaborator

Uh oh!

Uh oh!

aufr33 Dec 5, 2024 Collaborator

Uh oh!

Uh oh!

Landeplage Jun 24, 2025 Author

Uh oh!

Uh oh!

Replies: 7 comments 4 replies

aufr33
Mar 23, 2024
Collaborator

aufr33 Dec 5, 2024
Collaborator

Landeplage Jun 24, 2025
Author