Huge gradients when training

Hello @crlandsc @iver56 

I have tried out log-wsme as the loss function for training audio separation model but I noticed the gradients are really big. My setting was adamw with 0.0001 learning rate.

Do you have any advice for parameter settings such as learning rate?

Thanks