File tree Expand file tree Collapse file tree 3 files changed +18
-1
lines changed Expand file tree Collapse file tree 3 files changed +18
-1
lines changed Original file line number Diff line number Diff line change @@ -20,6 +20,10 @@ To release a new version (e.g. from `1.0.0` -> `2.0.0`):
2020 * Update the `[Unreleased]` url: `v1.0.0...HEAD` -> `v2.0.0...HEAD`
2121
2222-->
23+ ## [ 0.0.13] - 2025-07-25
24+
25+ * Fix occasional gaps in cumulative Goodput Monitor dashboard.
26+
2327## [ 0.0.12] - 2025-06-25
2428
2529* Support monitoring disruption badput due to infrastructure recovery.
@@ -97,6 +101,7 @@ To release a new version (e.g. from `1.0.0` -> `2.0.0`):
97101* Initial release of ML Goodput Measurement PyPi package
98102* Feature: Contains the Goodput module which allows logging and retrieval of training job's overall productive Goodput
99103
104+ [ 0.0.13 ] : https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.12...v0.0.13
100105[ 0.0.12 ] : https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.11...v0.0.12
101106[ 0.0.11 ] : https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.10...v0.0.11
102107[ 0.0.10 ] : https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.8...v0.0.10
Original file line number Diff line number Diff line change @@ -410,6 +410,18 @@ class BadputType(enum.Enum):
410410 detection of a planned disruption signal. The save operation in type of
411411 checkpointing is synchronous resulting in time lost to Badput.
412412
413+ > ** _ NOTE:_ ** This type of Badput is reported * only* when Orbax is used for
414+ checkpointing and requires the Orbax structured logger to be configured.
415+ To compute checkpointing Badput for other types of checkpointers (Non-Orbax),
416+ please use the Custom Badput Recorder API (instructions in
417+ [ Record Custom Badput Events (e.g., Evaluation, SDC Checks)] ( #record-custom-badput-events-eg-evaluation-sdc-checks ) )
418+ with an appropriate Custom Badput event type and wrap the blocking operation
419+ around the ` start ` and the ` stop ` API calls.
420+
421+ > ** _ NOTE:_ ** Do ** NOT** use the Custom Badput APIs for blocking checkpoint save
422+ operations if you are using Orbax. Either use Orbax's structured checkpoint
423+ logger ** OR** the Custom Badput API for any other type of checkpointing.
424+
413425 - Wasted Progress due to Disruption (WASTED_PROGRESS_FROM_DISRUPTION)
414426
415427 Based on checkpointing frequency, a disruption may result in time lost in the
Original file line number Diff line number Diff line change 1414
1515[project ]
1616name = " ml_goodput_measurement"
17- version = " 0.0.12 "
17+ version = " 0.0.13 "
1818authors = [
1919 {
name =
" Cloud TPU Team" ,
email =
" [email protected] " },
2020]
You can’t perform that action at this time.
0 commit comments