google-cloud-mldiagnostics

Overview

The google-cloud-mldiagnostics library is a Python package designed to help engineers and researchers monitor and diagnose machine learning training runs with GCP suite of diagnostic toolings. It provides tools for tracking workload progress, collecting metrics and profiling performance.

Supported Framework

jax
- any versions
Other in progress

How to install

Install

Install pypi package link

pip install google-cloud-mldiagnostics

This package does not install libtpu, jax and xprof and expects they are installed separately.

How to use

Monitor training

At the beginning of the training script, create a machine learning run:

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

Monitor with on-demand profiling

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
  on_demand_xprof=True
)

Monitor with programmatic profiling

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import xprof

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

xprof = xprof()
xprof.start()
# some code
xprof.stop()

Monitor with predefined metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics
from google_cloud_mldiagnostics import metric_types

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

metrics.record(metric_type.MetricType.LOSS, <value>)

To pair the metric value with the current step:

metrics.record(metric_type.MetricType.LOSS, <value>, step=<step>)

Monitor with customer metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

metrics.record("<my-metric>", <value>)

To pair the metric value with the current step:

metrics.record("<my-metric>", <value>, step=<value>)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/google_cloud_mldiagnostics		src/google_cloud_mldiagnostics
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

google-cloud-mldiagnostics

Overview

Supported Framework

How to install

Install

How to use

Monitor training

Monitor with on-demand profiling

Monitor with programmatic profiling

Monitor with predefined metrics

Monitor with customer metrics

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AI-Hypercomputer/google-cloud-mldiagnostics

Folders and files

Latest commit

History

Repository files navigation

google-cloud-mldiagnostics

Overview

Supported Framework

How to install

Install

How to use

Monitor training

Monitor with on-demand profiling

Monitor with programmatic profiling

Monitor with predefined metrics

Monitor with customer metrics

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages