Commit 1538e9f
Serialize cluster image builds with file lock
The `AlreadyExists` fix in #337 handled one symptom of parallel xdist
workers racing to build the same cluster image, but there's a second
failure mode showing up in CI:
https://github.com/chrisguidry/docket/actions/runs/22025132964/job/63640478732
When concurrent builds target the same tag, the Docker SDK's `build()`
completes successfully in the daemon, then tries to inspect the resulting
image by its short ID. If another worker's build re-tagged the image in
the meantime, the first image ID gets orphaned and the inspect 404s.
This knocked out 485 of 686 tests in the cluster job.
Rather than catching yet another exception type, this serializes the
builds with `fcntl.flock` so only one worker builds at a time. The
others wait and find it already built. Eliminates both the
`AlreadyExists` and `ImageNotFound` races structurally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent aace0e1 commit 1538e9f
1 file changed
+21
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
170 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
171 | 179 | | |
172 | 180 | | |
173 | 181 | | |
| |||
176 | 184 | | |
177 | 185 | | |
178 | 186 | | |
179 | | - | |
180 | | - | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
181 | 199 | | |
182 | 200 | | |
183 | 201 | | |
184 | 202 | | |
185 | 203 | | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | 204 | | |
192 | 205 | | |
193 | 206 | | |
| |||
0 commit comments