Skip to content

Releases: pytorch/test-infra

v20260209-232704

09 Feb 23:28
a79eb7c

Choose a tag to compare

Pin setuptools<82 to fix pkg_resources removal breakage (#7747)

setuptools 82.0.0 (released Feb 8, 2026) removed pkg_resources, which
breaks transitive dependencies that import it. Pin setuptools<82 in
build-system requires and dev requirements.

---------

Co-authored-by: Nikita Shulga <[email protected]>

v20260204-221009

04 Feb 22:11
9b2fc27

Choose a tag to compare

Add token usage and model tracking to Claude billing (#7729)

## Summary
- Adds token tracking fields to `misc.claude_code_usage` schema:
  - `input_tokens`, `output_tokens`
  - `cache_read_input_tokens`, `cache_creation_input_tokens`
  - `model`
- Updates `upload-claude-usage` action to extract these fields from
Claude output
- Updates S3 replicator lambda to handle new fields
- Creates v2 Grafana dashboard with token metrics

## Dashboard v2

https://pytorchci.grafana.net/public-dashboards/83058a8d65d44a099eb8d9ac2916f411

New metrics:
- Total input/output tokens
- Cache hit rate
- Cost by model breakdown
- Token usage by workflow
- Daily cache performance

## Test plan
- [ ] Deploy schema changes (ALTER TABLE to add columns)
- [ ] Deploy lambda changes
- [ ] Verify new data flows with token fields populated
- [ ] Verify v2 dashboard shows token metrics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <[email protected]>

v20260128-210015

28 Jan 21:02
165c8ed

Choose a tag to compare

set up PT x vLLM regression config (#7684)

Summary:
As title, this should connect regressions to the newly created GH issue.
Using 1.20 and 0.8 as thresholds.

Test Plan:
Local run with the following:
```
python aws/lambda/benchmark_regression_summary_report/lambda_function.py --clickhouse-endpoint ${CLICKHOUSE_ENDPOINT} --clickhouse-username ${DEV_USERNAME} --clickhouse-password ${CLICKHOUSE_PASSWORD} --config-id pytorch_x_vllm_benchmark
```
Ran both yesterday and today, and run today was sufficient to trigger
thresholds for regressions, so 20% seems appropriate here.

Reviewers:

Subscribers:

Tasks:

Tags:

v20260128-164443

28 Jan 16:46
6232f61

Choose a tag to compare

[helion][Benchmark] Increase the speedup threshold based on request (…

v20260126-220349

26 Jan 22:05
86c7370

Choose a tag to compare

[AUTOREVERT] Ask for claude bot to provide user guidance (#7692)

known limitations - does not work with forked PRs, but anthropic [seems
to be
working](https://github.com/anthropics/claude-code-action/issues/821) on
it

but the output [looks
nice](https://github.com/pytorch/pytorch/pull/173119#issuecomment-3801829468)

Signed-off-by: Jean Schmidt <[email protected]>

v20260122-234401

22 Jan 23:45
b3cea9a

Choose a tag to compare

Add Claude Code usage metrics upload action and database schema (#7675)

This pull request introduces a new pipeline for collecting and ingesting
Claude Code usage metrics into ClickHouse for analytics. The changes
span GitHub Actions, AWS Lambda ingestion logic, and database schema
additions to support this new data flow.


* reusable action to upload claude code metrics to s3
* clickhouse schema for the metrics
* turns on ingestion

v20260113-193414

13 Jan 19:36
479ee76

Choose a tag to compare

Change compiler regression report to median (#7648)

This change the logic to find the baseline of the compiler regression.

before it find the max metric value from past 3-6 days as baseline
value, and compare it to the latest 3 days metric values.
however, this create some noisy alert. Change it to median value , to
avoid some scenarios such as shown below


<img width="393" height="201" alt="image"
src="https://github.com/user-attachments/assets/edeff7f8-09a8-494e-8127-b3b0984a297f"
/>

v20260109-231750

09 Jan 23:19
0d43574

Choose a tag to compare

Bump form-data from 4.0.1 to 4.0.4 in /terraform-aws-github-runner/mo…

v20260107-222051

07 Jan 22:22
8e06df0

Choose a tag to compare

only send notification for compiler regression when it's h100 (#7634)

this pr still track regression for b200, but silence the notification if
the regression exists

alternative is we can not query the b200 data at all, so no regression
result is generated

v20260107-193807

07 Jan 19:39
63ff3fa

Choose a tag to compare

Fix dynamo key queries to use trailing slashes for accurate repo matc…