Releases: pytorch/test-infra
Releases · pytorch/test-infra
v20260209-232704
Pin setuptools<82 to fix pkg_resources removal breakage (#7747) setuptools 82.0.0 (released Feb 8, 2026) removed pkg_resources, which breaks transitive dependencies that import it. Pin setuptools<82 in build-system requires and dev requirements. --------- Co-authored-by: Nikita Shulga <[email protected]>
v20260204-221009
Add token usage and model tracking to Claude billing (#7729) ## Summary - Adds token tracking fields to `misc.claude_code_usage` schema: - `input_tokens`, `output_tokens` - `cache_read_input_tokens`, `cache_creation_input_tokens` - `model` - Updates `upload-claude-usage` action to extract these fields from Claude output - Updates S3 replicator lambda to handle new fields - Creates v2 Grafana dashboard with token metrics ## Dashboard v2 https://pytorchci.grafana.net/public-dashboards/83058a8d65d44a099eb8d9ac2916f411 New metrics: - Total input/output tokens - Cache hit rate - Cost by model breakdown - Token usage by workflow - Daily cache performance ## Test plan - [ ] Deploy schema changes (ALTER TABLE to add columns) - [ ] Deploy lambda changes - [ ] Verify new data flows with token fields populated - [ ] Verify v2 dashboard shows token metrics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <[email protected]>
v20260128-210015
set up PT x vLLM regression config (#7684)
Summary:
As title, this should connect regressions to the newly created GH issue.
Using 1.20 and 0.8 as thresholds.
Test Plan:
Local run with the following:
```
python aws/lambda/benchmark_regression_summary_report/lambda_function.py --clickhouse-endpoint ${CLICKHOUSE_ENDPOINT} --clickhouse-username ${DEV_USERNAME} --clickhouse-password ${CLICKHOUSE_PASSWORD} --config-id pytorch_x_vllm_benchmark
```
Ran both yesterday and today, and run today was sufficient to trigger
thresholds for regressions, so 20% seems appropriate here.
Reviewers:
Subscribers:
Tasks:
Tags:
v20260128-164443
[helion][Benchmark] Increase the speedup threshold based on request (…
v20260126-220349
[AUTOREVERT] Ask for claude bot to provide user guidance (#7692) known limitations - does not work with forked PRs, but anthropic [seems to be working](https://github.com/anthropics/claude-code-action/issues/821) on it but the output [looks nice](https://github.com/pytorch/pytorch/pull/173119#issuecomment-3801829468) Signed-off-by: Jean Schmidt <[email protected]>
v20260122-234401
Add Claude Code usage metrics upload action and database schema (#7675) This pull request introduces a new pipeline for collecting and ingesting Claude Code usage metrics into ClickHouse for analytics. The changes span GitHub Actions, AWS Lambda ingestion logic, and database schema additions to support this new data flow. * reusable action to upload claude code metrics to s3 * clickhouse schema for the metrics * turns on ingestion
v20260113-193414
Change compiler regression report to median (#7648) This change the logic to find the baseline of the compiler regression. before it find the max metric value from past 3-6 days as baseline value, and compare it to the latest 3 days metric values. however, this create some noisy alert. Change it to median value , to avoid some scenarios such as shown below <img width="393" height="201" alt="image" src="https://github.com/user-attachments/assets/edeff7f8-09a8-494e-8127-b3b0984a297f" />
v20260109-231750
Bump form-data from 4.0.1 to 4.0.4 in /terraform-aws-github-runner/mo…
v20260107-222051
only send notification for compiler regression when it's h100 (#7634) this pr still track regression for b200, but silence the notification if the regression exists alternative is we can not query the b200 data at all, so no regression result is generated
v20260107-193807
Fix dynamo key queries to use trailing slashes for accurate repo matc…