The 'NAG' submode in sgd_cm.m

Hi @hiroyuki-kasai , 

Thanks for publishing this project. This looks great and I would like do some research with this toolbox.

My trouble is in the sgd_cm.m. It looks like containing two momentum schemes, the classic one ('CM') and the Nesterov's ('NAG'), but in the current implementain, they seem to differ only in the setting of the momentum coefficient. See lines 78, 80, and 82. 

In my impression NAG should 'look one step ahead' before the gradient calculation, but in the code, the gradient is evaluated just in the current point. This seems to be inconsistent to the original paper. See equations 3 and 4 in Ilya Sutskever, James Martens, George Dahl and Geoffrey Hinton, "On the importance of initialization and momentum in deep learning,"  ICML, 2013.

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The 'NAG' submode in sgd_cm.m #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The 'NAG' submode in sgd_cm.m #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions