You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add prStatusCheck.py for automated PR status updates
This script automates checking and updating PR statuses in the iDFlakies dataset ({gr/pr/py}-data.csv). It queries PR status via the GitHub API and will update tests' status from "Opened" to "Accepted" when PRs are merged, and flagging closed-but-not-merged PRs as "Unknown" for manual review.
Features:
- Per-file row-range filtering (--prrange, --grrange, --pyrange)
- Ignore list support via ignore.csv
- PR status caching to avoid redundant API calls
- Handles CSV rows with embedded quotes correctly
- Reads GitHub token from .env file at repo root
For detailed documentation, see auto-update-dataset/python/README.md#prstatuscheckpy
`prStatusCheck.py` is a script for automatically checking and updating pull request statuses in the IDoFT dataset. It queries the GitHub API to retrieve PR status information and updates the corresponding test records.
--prrange PRRANGE Range of CSV rows for pr-data.csv (e.g., 100-200). If not specified, processes all rows. Uses actual CSV row
73
+
numbers (header=row 1, first data=row 2). Inclusive.
74
+
--grrange GRRANGE Range of CSV rows for gr-data.csv (e.g., 100-200). If not specified, processes all rows. Uses actual CSV row
75
+
numbers (header=row 1, first data=row 2). Inclusive.
76
+
--pyrange PYRANGE Range of CSV rows for py-data.csv (e.g., 100-200). If not specified, processes all rows. Uses actual CSV row
77
+
numbers (header=row 1, first data=row 2). Inclusive.
78
+
--threads THREADS Number of threads to use for parallel processing
79
+
```
80
+
81
+
### Setup
82
+
```
83
+
cd ~/idoft
84
+
cd auto-update-dataset/python
85
+
python -m venv .venv
86
+
source .venv/bin/activate
87
+
pip install -r requirements.txt
88
+
```
89
+
90
+
### Features
91
+
92
+
#### 1. Query by GitHub API
93
+
94
+
It reads the GitHub token from the .env file. To obtain and use your own token, go to https://github.com/settings/tokens and paste it at a file named as `.env` under `idoft` project root directory. Example:
95
+
```
96
+
GITHUB_TOKEN=<Your github token here>
97
+
```
98
+
99
+
#### 2. Per-File Independent Row Range Processing
100
+
101
+
Specify row ranges for each CSV file via command-line arguments
102
+
103
+
-**Arguments**:
104
+
105
+
-`--prrange`: Row range for pr-data.csv (e.g., `3802-3804`)
106
+
107
+
```bash
108
+
python prStatusCheck.py --prrange 3802-3804
109
+
2025-12-09 20:44:18,978 - INFO - --- Processing pr-data.csv ---
110
+
2025-12-09 20:44:18,978 - INFO - Loading data from local file: $HOME/idoft/pr-data.csv
111
+
2025-12-09 20:44:18,991 - INFO - Queued 3 tasks for pr-data.csv.
112
+
2025-12-09 20:44:19,397 - INFO - [pr-data.csv] Row 3802: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
113
+
2025-12-09 20:44:19,410 - INFO - [pr-data.csv] Row 3803: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
114
+
2025-12-09 20:44:19,968 - INFO - [pr-data.csv] Row 3804: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
115
+
2025-12-09 20:44:19,968 - INFO - summary for pr-data.csv: 0 statuses updated, 3 changed but need manual check, 0 still open.
116
+
2025-12-09 20:44:19,969 - INFO - Manual check log updated for pr-data.csv
117
+
```
118
+
119
+
- `--grrange`: Row range for gr-data.csv (e.g., `107-108`)
120
+
121
+
```bash
122
+
python prStatusCheck.py --grrange 107-108
123
+
2025-12-09 20:43:43,978 - INFO - --- Processing gr-data.csv ---
124
+
2025-12-09 20:43:43,978 - INFO - Loading data from local file: $HOME/idoft/gr-data.csv
125
+
2025-12-09 20:43:43,983 - INFO - Queued 2 tasks for gr-data.csv.
126
+
2025-12-09 20:43:44,574 - INFO - [gr-data.csv] Row 107: Status changed but could not be determined, remains Opened (https://github.com/apache/ignite-3/pull/4557. Please check manually.)
127
+
2025-12-09 20:43:44,726 - INFO - [gr-data.csv] Row 108: Status remained Opened (https://github.com/apache/ignite-3/pull/4836)
128
+
2025-12-09 20:43:44,726 - INFO - summary for gr-data.csv: 0 statuses updated, 1 changed but need manual check, 1 still open.
129
+
2025-12-09 20:43:44,726 - INFO - Manual check log updated for gr-data.csv
130
+
```
131
+
132
+
- `--pyrange`: Row range for py-data.csv (e.g., `43-43`)
133
+
134
+
```bash
135
+
python prStatusCheck.py --pyrange 43-43
136
+
2025-12-09 20:43:21,857 - INFO - --- Processing py-data.csv ---
137
+
2025-12-09 20:43:21,857 - INFO - Loading data from local file: $HOME/idoft/py-data.csv
138
+
2025-12-09 20:43:21,862 - INFO - Queued 1 tasks for py-data.csv.
139
+
2025-12-09 20:43:22,394 - INFO - [py-data.csv] Row 43: Status changed Opened -> Accepted (https://github.com/jazzband/docopt-ng/pull/20)
140
+
2025-12-09 20:43:22,395 - INFO - summary for py-data.csv: 1 statuses updated, 0 changed but need manual check, 0 still open.
141
+
2025-12-09 20:43:22,404 - INFO - Accepted log updated for py-data.csv
2025-12-09 20:44:54,560 - INFO - --- Processing py-data.csv ---
149
+
2025-12-09 20:44:54,560 - INFO - Loading data from local file: $HOME/idoft/py-data.csv
150
+
2025-12-09 20:44:54,564 - INFO - Queued 1 tasks for py-data.csv.
151
+
2025-12-09 20:44:55,044 - INFO - [py-data.csv] Row 43: Status changed Opened -> Accepted (https://github.com/jazzband/docopt-ng/pull/20)
152
+
2025-12-09 20:44:55,044 - INFO - summary for py-data.csv: 1 statuses updated, 0 changed but need manual check, 0 still open.
153
+
2025-12-09 20:44:55,050 - INFO - Accepted log updated for py-data.csv
154
+
2025-12-09 20:44:55,050 - INFO - --- Processing pr-data.csv ---
155
+
2025-12-09 20:44:55,050 - INFO - Loading data from local file: $HOME/idoft/pr-data.csv
156
+
2025-12-09 20:44:55,081 - INFO - Queued 3 tasks for pr-data.csv.
157
+
2025-12-09 20:44:55,493 - INFO - [pr-data.csv] Row 3802: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
158
+
2025-12-09 20:44:55,516 - INFO - [pr-data.csv] Row 3804: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
159
+
2025-12-09 20:44:55,584 - INFO - [pr-data.csv] Row 3803: Status changed but could not be determined, remains Opened (https://github.com/apache/tinkerpop/pull/1658. Please check manually.)
160
+
2025-12-09 20:44:55,584 - INFO - summary for pr-data.csv: 0 statuses updated, 3 changed but need manual check, 0 still open.
161
+
2025-12-09 20:44:55,584 - INFO - Manual check log updated for pr-data.csv
162
+
2025-12-09 20:44:55,584 - INFO - --- Processing gr-data.csv ---
163
+
2025-12-09 20:44:55,584 - INFO - Loading data from local file: $HOME/idoft/gr-data.csv
164
+
2025-12-09 20:44:55,595 - INFO - Queued 2 tasks for gr-data.csv.
165
+
2025-12-09 20:44:56,177 - INFO - [gr-data.csv] Row 107: Status changed but could not be determined, remains Opened (https://github.com/apache/ignite-3/pull/4557. Please check manually.)
166
+
2025-12-09 20:44:56,416 - INFO - [gr-data.csv] Row 108: Status remained Opened (https://github.com/apache/ignite-3/pull/4836)
167
+
2025-12-09 20:44:56,417 - INFO - summary for gr-data.csv: 0 statuses updated, 1 changed but need manual check, 1 still open.
168
+
2025-12-09 20:44:56,417 - INFO - Manual check log updated for gr-data.csv
169
+
```
170
+
171
+
- **Behavior**:
172
+
173
+
- If any range is specified: Only files with specified ranges are processed; others are skipped
174
+
- If no range is specified: All three files are processed in full
175
+
- Row numbers use actual CSV row numbers (header = row 1, first data = row 2)
176
+
177
+
#### 3. PR Status Query
178
+
179
+
There are three status mappings defined in this script:
A pull request can be closed without being merged forvarious reasons. For example, it may be marked as *DeveloperFixed*, *Rejected*, or fall into other flaky test statuses definedin IDoFT. In some cases, the changes are actually merged through an alternative workflow. Since these situations cannot be reliably distinguished automatically, such pull requests are classified as unknown and logged to `manual-check.log`for further inspection.
Assume that the 4th and 5th lines of `pr-data.csv` are as follows:
213
+
214
+
```csv
215
+
https://github.com/abel533/Mapper,1764748eedb2f320a0d1c43cb4f928c4ccb1f2f5,core,tk.mybatis.mapper.mapperhelper.FieldHelperTest.testComplex,ID,Accepted,https://github.com/abel533/Mapper/pull/896,Accepted in the PR https://github.com/abel533/Mapper/pull/666 but later reverted in the commit https://github.com/abel533/Mapper/commit/79d313a7ca6cba6c5d5323746fb83ed5744180a1
216
+
https://github.com/abel533/Mapper,1764748eedb2f320a0d1c43cb4f928c4ccb1f2f5,core,tk.mybatis.mapper.mapperhelper.FieldHelperTest.testUser,ID,Opened,https://github.com/abel533/Mapper/pull/896,Accepted in the PR https://github.com/abel533/Mapper/pull/666 but later reverted in the commit https://github.com/abel533/Mapper/commit/79d313a7ca6cba6c5d5323746fb83ed5744180a1
217
+
```
218
+
219
+
The output should match the following. Test on the 5th line is not processed.
220
+
221
+
```bash
222
+
python prStatusCheck.py --prrange 4-5
223
+
2025-12-09 21:07:20,243 - INFO - Loading ignore list from $HOME/idoft/auto-update-dataset/ignore.csv
224
+
2025-12-09 21:07:20,245 - INFO - --- Processing pr-data.csv ---
225
+
2025-12-09 21:07:20,245 - INFO - Loading data from local file: $HOME/idoft/pr-data.csv
226
+
2025-12-09 21:07:20,255 - INFO - Processing CSV rows 4-5
227
+
2025-12-09 21:07:20,256 - INFO - Queued 1 tasks for pr-data.csv.
228
+
2025-12-09 21:07:20,776 - INFO - [pr-data.csv] Row 4: Status changed Opened -> Accepted (https://github.com/abel533/Mapper/pull/896)
229
+
2025-12-09 21:07:20,777 - INFO - summary for pr-data.csv: 1 statuses updated, 0 changed but need manual check, 0 still open.
230
+
2025-12-09 21:07:20,796 - INFO - Accepted log updated for pr-data.csv
231
+
```
232
+
233
+
##### 4.2 Example for Python
234
+
235
+
Assume that rows between 1022 and 1024 of `pr-data.csv` are as follows:
0 commit comments