Skip to content

Commit 4268bd2

Browse files
dipannita08copybara-github
authored andcommitted
Update ml-goodput-measurement package for prod PyPi release v0.0.13.
PiperOrigin-RevId: 787142637
1 parent 75f11a0 commit 4268bd2

File tree

3 files changed

+18
-1
lines changed

3 files changed

+18
-1
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ To release a new version (e.g. from `1.0.0` -> `2.0.0`):
2020
* Update the `[Unreleased]` url: `v1.0.0...HEAD` -> `v2.0.0...HEAD`
2121
2222
-->
23+
## [0.0.13] - 2025-07-25
24+
25+
* Fix occasional gaps in cumulative Goodput Monitor dashboard.
26+
2327
## [0.0.12] - 2025-06-25
2428

2529
* Support monitoring disruption badput due to infrastructure recovery.
@@ -97,6 +101,7 @@ To release a new version (e.g. from `1.0.0` -> `2.0.0`):
97101
* Initial release of ML Goodput Measurement PyPi package
98102
* Feature: Contains the Goodput module which allows logging and retrieval of training job's overall productive Goodput
99103

104+
[0.0.13]: https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.12...v0.0.13
100105
[0.0.12]: https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.11...v0.0.12
101106
[0.0.11]: https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.10...v0.0.11
102107
[0.0.10]: https://github.com/AI-Hypercomputer/ml-goodput-measurement/compare/v0.0.8...v0.0.10

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,18 @@ class BadputType(enum.Enum):
410410
detection of a planned disruption signal. The save operation in type of
411411
checkpointing is synchronous resulting in time lost to Badput.
412412

413+
> **_NOTE:_** This type of Badput is reported *only* when Orbax is used for
414+
checkpointing and requires the Orbax structured logger to be configured.
415+
To compute checkpointing Badput for other types of checkpointers (Non-Orbax),
416+
please use the Custom Badput Recorder API (instructions in
417+
[Record Custom Badput Events (e.g., Evaluation, SDC Checks)](#record-custom-badput-events-eg-evaluation-sdc-checks))
418+
with an appropriate Custom Badput event type and wrap the blocking operation
419+
around the `start` and the `stop` API calls.
420+
421+
> **_NOTE:_** Do **NOT** use the Custom Badput APIs for blocking checkpoint save
422+
operations if you are using Orbax. Either use Orbax's structured checkpoint
423+
logger **OR** the Custom Badput API for any other type of checkpointing.
424+
413425
- Wasted Progress due to Disruption (WASTED_PROGRESS_FROM_DISRUPTION)
414426

415427
Based on checkpointing frequency, a disruption may result in time lost in the

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
[project]
1616
name = "ml_goodput_measurement"
17-
version = "0.0.12"
17+
version = "0.0.13"
1818
authors = [
1919
{ name="Cloud TPU Team", email="[email protected]" },
2020
]

0 commit comments

Comments
 (0)