Skip to content
This repository was archived by the owner on Jul 2, 2026. It is now read-only.

Commit da47bf4

Browse files
committed
fix(download): use staging + atomic rename in extract_archive_atomic
v1.0.6 serialised concurrent tar extractions via an mkdir-based lock, but the polling loop's early-exit on \`[[ -x bin_path ]]\` observed the file the moment tar created the inode with its archived perms — well before tar finished writing it. The early-exiting process then went on to \`exec\` the partially-written binary, surfacing as \`Text file busy\` (ETXTBSY) in CI logs. Replace the lock with the same pattern \`download_tool_from_archive\` already uses: extract into a per-invocation staging dir on the same filesystem, then atomically rename the top-level entry into place. Parallel callers now see the binary as either fully-absent or fully-present — never mid-write.
1 parent 5a37489 commit da47bf4

2 files changed

Lines changed: 41 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3030
`mkdir -p` + `tar -x` in the run script, which raced when prek dispatched
3131
the hook with `concurrency > 1` — multiple processes saw a missing
3232
`bin_path`, both ran `tar`, and the second one died with
33-
`tar: <path>: Cannot open: File exists`. The shared helper serialises
34-
extraction via an atomic mkdir-based lock.
33+
`tar: <path>: Cannot open: File exists`. The shared helper publishes the
34+
extracted tree via tar-to-staging + atomic `rename(2)`, so parallel callers
35+
observe the binary as either fully-absent or fully-present — never the
36+
in-progress mid-write state that surfaces as ETXTBSY on `exec`.
3537

3638
### Added
3739

lib/download.sh

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -201,28 +201,49 @@ download_tool_from_archive() {
201201
}
202202

203203
# extract_archive_atomic <archive_path> <extract_dir> <bin_path> <tar_flags...>
204-
# Extract an archive into ${extract_dir}, serializing concurrent callers via an
205-
# atomic mkdir-based lock so prek's parallel hook execution does not race on
206-
# overlapping `tar -x` calls (manifests as "tar: …: Cannot open: File exists"
207-
# when two procs touch the same files). Idempotent: returns immediately if the
208-
# expected binary is already present.
204+
# Extract an archive into ${extract_dir} concurrency-safely:
205+
#
206+
# 1. tar into a per-invocation staging dir under ${extract_dir} (so the
207+
# eventual `mv` stays on the same filesystem and `rename(2)` is atomic).
208+
# 2. atomically `mv` the relevant subtree into place. If another concurrent
209+
# caller beat us to it, the target already exists and we drop our staged
210+
# copy. The published binary therefore goes from "doesn't exist" → "fully
211+
# written" in a single rename, so `exec ${bin_path}` from a parallel
212+
# caller cannot observe a partially-written file (no ETXTBSY).
213+
#
214+
# bin_path is expected to live one directory under extract_dir (most upstream
215+
# tarballs ship as `tool-vX/<binary>`); for archives that drop the binary
216+
# directly at the top level, pass extract_dir == dirname(bin_path) and the
217+
# helper still does the right thing.
209218
extract_archive_atomic() {
210219
local archive_path="$1" extract_dir="$2" bin_path="$3"
211220
shift 3
212221
if [[ -x "${bin_path}" ]]; then
213222
return 0
214223
fi
215224
mkdir -p "${extract_dir}"
216-
local lock="${extract_dir}/.extract.lock"
217-
while ! mkdir "${lock}" 2>/dev/null; do
218-
if [[ -x "${bin_path}" ]]; then
219-
return 0
220-
fi
221-
sleep 0.1
222-
done
223-
if [[ ! -x "${bin_path}" ]]; then
224-
tar "$@" -f "${archive_path}" -C "${extract_dir}"
225-
chmod +x "${bin_path}"
225+
local stage
226+
stage="$(mktemp -d "${extract_dir}/.stage.XXXXXX")"
227+
tar "$@" -f "${archive_path}" -C "${stage}"
228+
# Recompute bin_path within stage by swapping the extract_dir prefix. This
229+
# works whether the archive ships a subdir wrapper or drops the binary at
230+
# the top level.
231+
local rel="${bin_path#"${extract_dir}/"}"
232+
local staged_bin="${stage}/${rel}"
233+
if [[ ! -e "${staged_bin}" ]]; then
234+
rm -rf "${stage}"
235+
printf 'extract_archive_atomic: expected %s inside archive %s, not found after extraction\n' \
236+
"${rel}" "${archive_path}" >&2
237+
return 1
238+
fi
239+
chmod 0755 "${staged_bin}"
240+
# Compute the top-level entry inside stage that we need to promote into
241+
# extract_dir. For a nested layout (`shellcheck-vX/shellcheck`), this is
242+
# `shellcheck-vX`. For a flat layout (`kubeconform`), this is the binary
243+
# itself. Either way `rename(2)` is atomic when the target doesn't exist.
244+
local top="${rel%%/*}"
245+
if [[ ! -e "${extract_dir}/${top}" ]]; then
246+
mv "${stage}/${top}" "${extract_dir}/${top}" 2>/dev/null || true
226247
fi
227-
rmdir "${lock}"
248+
rm -rf "${stage}"
228249
}

0 commit comments

Comments
 (0)