Skip to content

[Question] Is it a must to substitute non_slot_dict around calling apply_gradient on tensorflow optimizer? #477

@Nov11

Description

@Nov11

This template is for generic questions that a user may have in using HugeCTR.

Note: Before filing an issue, you may want to check out our compiled Q&A list first.

code link:

I'm reading code of optimizer wrapper for tensorflow graph mode.
It seems for an optimize like Adam as self._optimizer, its beta1_power and beta2_power are never updated. Instead the Variable copies in OptimizerWrapper are updated.
I suppose it is ok to update non slot variables in self._optimizers and just skip the substitution of non_slot_dict.

Please correct me if I'm wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions