fsext: serialize first-fill in CacheOnReadFs to fix concurrent-open race#6125
Open
mstoykov wants to merge 1 commit into
Open
fsext: serialize first-fill in CacheOnReadFs to fix concurrent-open race#6125mstoykov wants to merge 1 commit into
mstoykov wants to merge 1 commit into
Conversation
afero.CacheOnReadFs.Open on a cacheMiss runs base.Stat + copyToLayer +
layer.Open. A second goroutine entering Open during copyToLayer classifies
the mid-copy layer file as cacheHit and returns layer.Open(name) on a
partially written file, reading zero or truncated bytes.
The wrapper now serializes the cacheMiss -> copyToLayer -> layer.Open
sequence per path via a fill mutex, and remembers paths whose first open
has completed so subsequent opens take a fast path with no extra locking.
OpenFile shares the same path since it has the same shape in afero.
Queued waiters that wake after the first filler finished release the
fill mutex before running openFn so they can fan out through afero's
cacheHit branch in parallel.
First observed as a flake in TestRunCSVParseConcurrentFromMultipleModules
where three top-level-await modules each call fs.open("data.csv") +
csv.parse(file) and one ends up reading 0 bytes. The test passes
-race -count=500 cleanly after this change; before it produced multiple
failures at the same count.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Serialize the first
cacheMiss -> copyToLayer -> layer.Opensequence infsext.CacheOnReadFsper path via a fill mutex, and remember paths whosefirst open has completed so subsequent opens take a fast path with no
extra locking.
OpenFileshares the same code path since it has the sameshape in afero. Queued waiters that wake after the first filler finished
release the fill mutex before running the underlying open so they can fan
out through afero's cacheHit branch in parallel.
Why?
afero.CacheOnReadFs.Openon acacheMissrunsbase.Stat + copyToLayer + layer.Open. A second goroutine enteringOpenduringthat
copyToLayerclassifies the still-being-copied layer file ascacheHitand returnslayer.Open(name)on a partially written file,reading zero or truncated bytes.
First observed as a flake in
TestRunCSVParseConcurrentFromMultipleModulesin
internal/cmd/tests: three top-level-await modules each callfs.open("data.csv") + csv.parse(file), one ends up reading 0 bytes, therun exits with code 107. After this change the test passes
-race -count=500cleanly; before it produced multiple failures at thesame count.
The fix lives in the k6 wrapper (
lib/fsext/cacheonread.go) — no changesto vendored afero. A regression test in
lib/fsext/cacheonread_test.goexercises both the
OpenandOpenFileentry points via a sharedharness that races N goroutines through a first open and asserts every
one reads the full content.
Checklist
make check) and all pass.Checklist: Documentation (only for k6 maintainers and if relevant)
Related PR(s)/Issue(s)
None.