[Question] Is it a must  to substitute non_slot_dict around calling apply_gradient on tensorflow optimizer?

This template is for generic questions that a user may have in using HugeCTR.

**Note**: Before filing an issue, you may want to check out [our compiled Q&A list](https://github.com/NVIDIA/HugeCTR/blob/master/docs/QAList.md) first.

code link:
https://github.com/NVIDIA-Merlin/HugeCTR/blob/d4fecdb3f7c0df850c4bff76ca7fdcdb1c5bafce/sparse_operation_kit/sparse_operation_kit/optimizer.py#L205

I'm reading code of optimizer wrapper for tensorflow graph mode.
It seems for an optimize like Adam as self._optimizer, its beta1_power and beta2_power are never updated. Instead the Variable copies in OptimizerWrapper are updated.
I suppose it is ok to update non slot variables in self._optimizers and just skip the substitution of non_slot_dict.

Please correct me if I'm wrong.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Is it a must to substitute non_slot_dict around calling apply_gradient on tensorflow optimizer? #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Is it a must to substitute non_slot_dict around calling apply_gradient on tensorflow optimizer? #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions