Releases: pytorch/test-infra
Releases · pytorch/test-infra
v20250627-200622
runners: make ssm policy an array (#6858)
Fixes an issue where the SSM parameter policies were not being set
correctly.
Resulted in errors like:
ValidationException: Invalid policies input:
{"Type":"Expiration","Version":"1.0","Attributes":{"Timestamp":"2025-06-27T19:11:55.437Z"}}.
Signed-off-by: Eli Uriegas <[email protected]>
v20250627-185904
[log classifier] Rule for graph break registry check (#6837) For failures like [GH job link](https://github.com/pytorch/pytorch/actions/runs/15859789097/job/44714997710) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/c1ad4b8e7a16f54c35a3908b56ed7d9f95eef586) Currently matches ` ##[error]Process completed with exit code 1.` but there is a better line `Found the unimplemented_v2 or unimplemented_v2_with_warning calls below that don't match the registry in graph_break_registry.json.`
v20250627-183532
runners: Add expiration policy to SSM parameters (#6855) Instead of doing expensive cleanups we can rely on SSM parameter policies to do the cleanup for us! This is a workaround to avoid the need to do expensive cleanup of SSM parameters. Signed-off-by: Eli Uriegas <[email protected]>
v20250627-181625
runners: More scale-down perf improvements (#6854) Does the following: * Removes ssm parameter cleanup from terminateInstances (will be a follow-up PR to add a termination policy to parameters) * Removed double check for ghRunner calls (was causing performance bottlenecks * NOTE: We will need to monitor removeGHRunnerOrg calls to see if those introduce another performance bottleneck + job cancellations (if they rise then we revert, dashboard: https://hud.pytorch.org/job_cancellation_dashboard) Signed-off-by: Eli Uriegas <[email protected]> --------- Signed-off-by: Eli Uriegas <[email protected]>
v20250627-162018
runners: Add batching for terminateRunner (#6852) I had noticed that we were terminating instances 1 by 1 in the original code so this adds batching for terminateRunner calls in order to fix those performance bottlenecks. As well during the termination we were deleting ssm parameters one by one so this also adds batching to the ssm parameter deletion as well. Goal here was to implement the performance improvements with minimal changes. This PR super-cedes #6725 --------- Signed-off-by: Eli Uriegas <[email protected]> Signed-off-by: Eli Uriegas <[email protected]>
v20250625-172401-custom
[Cleanup]move to existing misc instead of fortesting db table (#6829) Cleanup to sync everything in misc instead of fortesitng tables --------- Signed-off-by: Yang Wang <[email protected]>
v20250625-151032-custom
Merge 02f5375f7040958338a9fe5a4bd92785feafbb76 into 605d7e413935b5d39…
v20250624-184555-custom
Merge 527371121b48dfc044c5d64f50085e858f2a7ba9 into 2fb8b30f965a04756…
v20250624-184047-custom
Keep going display on HUD: lambda to trigger log classifier when temp…