Open
Description
I've noticed the superloss implementation is similar with AlanChou's unoffical implementation (https://github.com/AlanChou/Super-Loss). Both of which used scipy to calculate lambertw. However, as stated in AlanChou's implementation, quoted:
The labertw function should be implemented with PyTorch instead of using the scipy library as mentioned in AlanChou/Truncated-Loss#3 (comment).
There is a mistake because the additive regularization part doesn't have any gradients for Autograd.
Does this implementation solve the above problem?
Metadata
Metadata
Assignees
Labels
No labels