Skip to content

MLFlow logger: log system metrics for all nodes when rank_zero_only is True #3905

@WeichenXu123

Description

@WeichenXu123

🚀 Feature Request

When setting rank_zero_only to True and setting log_system_metrics to True, the Mlflow logger should log system metrics for all nodes, these metrics are all logged into the MLflow run created by Rank-0, and the metric keys are grouped by node IP.

Motivation

In current Composer MLflow logger implementation, if setting rank_zero_only to True and setting log_system_metrics to True, the system metrics are only logged in the first node. Then user can't view the system metrics of other nodes.

[Optional] Implementation

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew (engineering) enhancements, such as features or API changes.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions