Skip to content

fsext: serialize first-fill in CacheOnReadFs to fix concurrent-open race#6125

Open
mstoykov wants to merge 1 commit into
masterfrom
fix-fsext-cacheonread-race
Open

fsext: serialize first-fill in CacheOnReadFs to fix concurrent-open race#6125
mstoykov wants to merge 1 commit into
masterfrom
fix-fsext-cacheonread-race

Conversation

@mstoykov

@mstoykov mstoykov commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What?

Serialize the first cacheMiss -> copyToLayer -> layer.Open sequence in
fsext.CacheOnReadFs per path via a fill mutex, and remember paths whose
first open has completed so subsequent opens take a fast path with no
extra locking. OpenFile shares the same code path since it has the same
shape in afero. Queued waiters that wake after the first filler finished
release the fill mutex before running the underlying open so they can fan
out through afero's cacheHit branch in parallel.

Why?

afero.CacheOnReadFs.Open on a cacheMiss runs base.Stat + copyToLayer + layer.Open. A second goroutine entering Open during
that copyToLayer classifies the still-being-copied layer file as
cacheHit and returns layer.Open(name) on a partially written file,
reading zero or truncated bytes.

First observed as a flake in TestRunCSVParseConcurrentFromMultipleModules
in internal/cmd/tests: three top-level-await modules each call
fs.open("data.csv") + csv.parse(file), one ends up reading 0 bytes, the
run exits with code 107. After this change the test passes
-race -count=500 cleanly; before it produced multiple failures at the
same count.

The fix lives in the k6 wrapper (lib/fsext/cacheonread.go) — no changes
to vendored afero. A regression test in lib/fsext/cacheonread_test.go
exercises both the Open and OpenFile entry points via a shared
harness that races N goroutines through a first open and asserts every
one reads the full content.

Checklist

  • I have performed a self-review of my code.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests for my changes.
  • I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

  • I have added the correct milestone and labels to the PR.
  • I have updated the release notes: link
  • I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
  • I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

None.

afero.CacheOnReadFs.Open on a cacheMiss runs base.Stat + copyToLayer +
layer.Open. A second goroutine entering Open during copyToLayer classifies
the mid-copy layer file as cacheHit and returns layer.Open(name) on a
partially written file, reading zero or truncated bytes.

The wrapper now serializes the cacheMiss -> copyToLayer -> layer.Open
sequence per path via a fill mutex, and remembers paths whose first open
has completed so subsequent opens take a fast path with no extra locking.
OpenFile shares the same path since it has the same shape in afero.
Queued waiters that wake after the first filler finished release the
fill mutex before running openFn so they can fan out through afero's
cacheHit branch in parallel.

First observed as a flake in TestRunCSVParseConcurrentFromMultipleModules
where three top-level-await modules each call fs.open("data.csv") +
csv.parse(file) and one ends up reading 0 bytes. The test passes
-race -count=500 cleanly after this change; before it produced multiple
failures at the same count.
@mstoykov mstoykov requested a review from a team as a code owner July 2, 2026 15:44
@mstoykov mstoykov requested review from ankur22 and inancgumus and removed request for a team July 2, 2026 15:44
@mstoykov mstoykov temporarily deployed to azure-trusted-signing July 2, 2026 15:50 — with GitHub Actions Inactive
@mstoykov mstoykov temporarily deployed to azure-trusted-signing July 2, 2026 15:52 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant