Skip to content

SnifferCaptain/Snake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Snake

Snake Activation For Pytorch

Happy Snake Year 2025!

what's Snake?

snake is an easy activation function.

  • SnakeA is: y = tanh(x) + relu(x)
  • SnakeB is: y = tanh(x) + silu(x)
  • SnakeC is: y = erf(x) + gelu(x)

their graphs are as follows:

image

SnakeA




image

SnakeB




image

SnakeC




image DytSnakeB




reason

I've notice that fashion activations like SiLU, GeLU, ReLU as well as Mish are the kind of self-gated activations. They have one in common which is they are closely zero while the input is negative. According to the paper Searching for Activation Functions (which Introduced swish activation), they found that most input values of the swish activation are in negative part, which in my opinion shows that the well-trained net is "eager to learn something nagetive". Other than that, if we follows the units like "conv-bn-act" or "linear-act", the output of the activation will be the input of the next layer's linear weight, not bias. and the input of value will be a closely zero if the network is "eager to learn something negative", then grad will be hard to flow through this part.
So, Snake activation is an activation more like ELU. More grad flow pass the layer and linear in its next layer can get more information.





picture from Geogerbra

About

Snake Activation For Pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages