GPU kernels for optimizers

### Motivation and description

Wondering what kind of speedup can be achieved by writing GPU kernels for optimizers.

Take a look at @pxl-th's implementation of Adam below

https://github.com/JuliaNeuralGraphics/NerfUtils.jl/blob/main/src/nn/adam.jl#L100-L117

### Possible Implementation

_No response_