Skip to content

generalize metric reporting #1434

@gshennvm

Description

@gshennvm

the metric reporting we have is hard coded at various places which caused us to have this grad norm bug 9475e7b. As an example, the code here in grpo.py

for k, v in metrics.items():
determines which metric to mean and which to sum, taking the control out of the policy worker's hands. This can cause problems like the grad norm issue. I think a better solution is for the policy worker to tell grpo.py which metric needs to be summed or averaged. Or perhaps average it within the worker itself to avoid the grad norm bug from happening again

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions