Skip to content

Releases: pytorch/test-infra

v20250627-200622

27 Jun 20:08
0758ff2

Choose a tag to compare

runners: make ssm policy an array (#6858)

Fixes an issue where the SSM parameter policies were not being set
correctly.

Resulted in errors like:

ValidationException: Invalid policies input:
{"Type":"Expiration","Version":"1.0","Attributes":{"Timestamp":"2025-06-27T19:11:55.437Z"}}.

Signed-off-by: Eli Uriegas <[email protected]>

v20250627-185904

27 Jun 19:01
66a282f

Choose a tag to compare

[log classifier] Rule for graph break registry check (#6837)

For failures like [GH job
link](https://github.com/pytorch/pytorch/actions/runs/15859789097/job/44714997710)
[HUD commit
link](https://hud.pytorch.org/pytorch/pytorch/commit/c1ad4b8e7a16f54c35a3908b56ed7d9f95eef586)

Currently matches ` ##[error]Process completed with exit code 1.`
but there is a better line
`Found the unimplemented_v2 or unimplemented_v2_with_warning calls below
that don't match the registry in graph_break_registry.json.`

v20250627-183532

27 Jun 18:37
99c977d

Choose a tag to compare

runners: Add expiration policy to SSM parameters (#6855)

Instead of doing expensive cleanups we can rely on SSM parameter
policies to do the cleanup for us!

This is a workaround to avoid the need to do expensive cleanup of SSM
parameters.

Signed-off-by: Eli Uriegas <[email protected]>

v20250627-181625

27 Jun 18:18
4556a13

Choose a tag to compare

runners: More scale-down perf improvements (#6854)

Does the following:
* Removes ssm parameter cleanup from terminateInstances (will be a
follow-up PR to add a termination policy to parameters)
* Removed double check for ghRunner calls (was causing performance
bottlenecks
* NOTE: We will need to monitor removeGHRunnerOrg calls to see if those
introduce another performance bottleneck + job cancellations (if they
rise then we revert, dashboard:
https://hud.pytorch.org/job_cancellation_dashboard)

Signed-off-by: Eli Uriegas <[email protected]>

---------

Signed-off-by: Eli Uriegas <[email protected]>

v20250627-162018

27 Jun 16:22
003bee0

Choose a tag to compare

runners: Add batching for terminateRunner (#6852)

I had noticed that we were terminating instances 1 by 1 in the original
code so this adds batching for terminateRunner calls in order to fix
those performance bottlenecks.

As well during the termination we were deleting ssm parameters one by
one so this also adds batching to the ssm parameter deletion as well.

Goal here was to implement the performance improvements with minimal
changes.

This PR super-cedes #6725

---------

Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>

v20250625-172401-custom

25 Jun 17:26
83fa1c2

Choose a tag to compare

[Cleanup]move to existing misc instead of fortesting db table (#6829)

Cleanup to sync everything in misc instead of fortesitng tables

---------

Signed-off-by: Yang Wang <[email protected]>

v20250625-151032-custom

25 Jun 15:12
43fba2f

Choose a tag to compare

Merge 02f5375f7040958338a9fe5a4bd92785feafbb76 into 605d7e413935b5d39…

v20250624-184555-custom

24 Jun 18:48
fed2a6d

Choose a tag to compare

Merge 527371121b48dfc044c5d64f50085e858f2a7ba9 into 2fb8b30f965a04756…

v20250624-184047-custom

24 Jun 18:42
2fb8b30

Choose a tag to compare

Keep going display on HUD: lambda to trigger log classifier when temp…

v20250624-174344

24 Jun 17:45

Choose a tag to compare