Skip to content

Dashboards for training and evaluation loss #104

Description

@mseeger

Is your feature request related to a problem? Please describe.
There is currently no end-2-end support for tensorboard or other training/evaluation loss dashboards.

Describe the solution you'd like
Solution should show real-time metrics for training jobs running:

  • Basic metrics (training loss, validation loss)
  • Health of job (maybe including email warning if job fails)
  • GPU memory usage

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions