Open
Description
There are a couple issues with the old SGD:
- Too slow compared to the other differentiators
- Interface is cluttered
To resolve these concerns, we need to split these differentiators into separate classes for each kind of stochasticity, and the ops need to be pushed into C++ implementations as was done for the parameter shift diff. @jaeyoo see PR #372 .