Skip to content

Releases: pytorch/test-infra

v20250715-232104

15 Jul 23:23
6dd711f

Choose a tag to compare

runners: Elevate some debug logs to info logs (#6933)

This was annoying me when reading through the logs where some of these
messages were getting swallowed when I was filtering out debug logs.
Hopefully with these as info logs we'll be able to debug things easier.

Signed-off-by: Eli Uriegas <[email protected]>

v20250715-231438

15 Jul 23:16
073cd99

Choose a tag to compare

runners: Remove debug logging for listRunners (#6932)

This log message was driving me insane and causing a lot of useless
noise in the logs. Removing so I can preserve my sanity.

Signed-off-by: Eli Uriegas <[email protected]>

v20250715-210301

15 Jul 21:05
601e6f4

Choose a tag to compare

runners: Move runner removal logic up (#6930)

This is a refactor of scale-down to move the runner removal logic up to
the top of the loop. This is done to avoid long wait times between
determining if a runner should be removed and actually removing it.

In practice we were observing wait times of up to 7 to 10 minutes.

This might only actually be testable with production traffic / rate
limits.

---------

Signed-off-by: Eli Uriegas <[email protected]>

v20250709-181311

09 Jul 18:15
26f0013

Choose a tag to compare

[ez][CH] Fix infra_metrics.cloud.watch_metrics schema: use DateTime64…

v20250708-173352

08 Jul 17:36
3b952a3

Choose a tag to compare

[ghinfra] Set up ingestion from s3 -> clickhouse for cloudwatch (#6898)

Path: cloudwatch metrics -> firehose -> s3 (new bucket
fbossci-cloudwatch-metrics) -> clickhouse

This is the s3 -> clickhouse part
I think clickhouse has some in built ingestions for kinesis but I'm
lazy...

Requires https://github.com/pytorch-labs/pytorch-gha-infra/pull/751

Testing: ran the python code via
`python tools/rockset_migration/s32ch.py --clickhouse-table
"infra_metrics.cloudwatch_metrics" --stored-data t.json --s3-bucket
fbossci-cloudwatch-metrics --s3-prefix ghci-related`

v20250703-021349

03 Jul 02:16
7d5d073

Choose a tag to compare

Add revert category extractionand exclude `ghfirst` reverts from stat…

v20250630-183403

30 Jun 18:36
5f86d76

Choose a tag to compare

[Pytorch AutoRevert] - Improves autorevert check heuristics  (#6853)

Do some improvements in the back analisys for the revert logic with the
goal of improving precision and recall and validate as a valid strategy.

Checked against the workflows: pull trunk inductor
linux-binary-manywheel

Old code:
```
Timeframe: 720 hours
Commits checked: 6177
Auto revert patterns detected: 188
Actual reverts inside auto revert patterns detected: 24 (12.8%)
Total revert commits in period: 115
Reverts that dont match any auto revert pattern detected: 91
```

Newer code:
```
Workflow(s): pull, trunk, inductor, linux-binary-manywheel
Timeframe: 720 hours
Commits checked: 5403
Auto revert patterns detected: 442
Actual reverts inside auto revert patterns detected (precision): 48 (10.9%)
Total revert commits in period: 115
Reverts that dont match any auto revert pattern detected (recall): 67 (58.3%)
Per workflow precision:
  pull: 45 reverts out of 411 patterns (10.9%)
  trunk: 1 reverts out of 8 patterns (12.5%)
  inductor: 2 reverts out of 20 patterns (10.0%)
  linux-binary-manywheel: 0 reverts out of 3 patterns (0.0%)
```

Critical implemented changes:
* Look forward and back for the first commit that ran the failed job,
instead of trusting on always looking on the one right before or right
after.
* Job names have parts we don't care, like shards indices. As a failure
could happen in any shard we want to find any shard with the same
failure;

Things I tried and don't lead to great results:
* ignoring error classification - too low precision, not significant
increase in recall
* not requiring error repetition - too low precision, not significant
increase in recall

My take:
With a precision of 10% it justifies the cost of re-running jobs in
order to confirm redness status, even if it is not possible to test, I
suspect that the fact we force require the same output 2 times for all 3
signals, this should elevate the precision to a very high standard.
Unfortunately the only way to test is run this in shadow mode.

With a recall of 55%, it points out to being able to capture **most** of
the introduced trunk redness errors. Lots of reverts might not be caused
by ci redness, especially not in the workflows we are analyzing (could
be performance degradation, GHF/internal reasons and many others). This
number seems comfortable to provide a substantial gain in benefit for CI
quality.

v20250630-164255

30 Jun 16:45
3d3500e

Choose a tag to compare

runners: Revert things related to batch termination (#6868)

This reverts the following PRs:
* #6859 
* #6858 
* #6855 
* #6854
* #6852

These were causing issues where scale-down was too aggressively scaling
down instances leading to runners not being refreshed by scale-up.

I do think the SSM expiration stuff is worth a re-do though but there
were merge conflicts so I have to revert the entire thing.

v20250627-203612

27 Jun 20:38
9665a59

Choose a tag to compare

runners: Fix lint (#6859)

There was some outstanding lint issues from previous PRs.

Fixes the lint and formatting.

Signed-off-by: Eli Uriegas <[email protected]>

v20250627-202541

27 Jun 20:27
fd736eb

Choose a tag to compare

[ez][docs] Add wiki maintenance magic strings to aws/lambda/readme (#…