fix: ensure csv.parse generates a single shared array per file+options #5503

oleiade · 2025-12-16T13:18:44Z

What

The csv.parse function was creating a new SharedArray with a random name (based on time.Now().Nanosecond()) for each VU, causing memory usage to scale linearly with VU count instead of sharing data across VUs. For a 40MB CSV file with 100 VUs, this resulted in ~6.5GB memory usage instead of the expected ~200MB.

This commit fixes the issue by:

Replacing random SharedArray name generation with deterministic hash-based naming. The name is now derived from the file path and parser options (delimiter, skipFirstLine, fromLine, toLine, asObjects) using SHA256, ensuring the same file+options combination always produces the same SharedArray name.
Refactoring the internal sharedArrays structure to use a slot-based system with sync.Once guarantees. This ensures the underlying data is initialized exactly once per unique name, even when multiple VUs call csv.parse concurrently.
Adding loadOrStore method that uses double-checked locking to minimize contention while ensuring thread-safe single initialization of shared arrays.

Closes #5493

Demo

Given a 40MB csv file, and the following script:

import { open } from 'k6/experimental/fs'
import csv from 'k6/experimental/csv'
import { scenario } from 'k6/execution'

export const options = {
    vus: 20,
	duration: '10m',
}

// Open the csv file, and parse it ahead of time.
const file = await open('data.csv');
// The `csv.parse` function consumes the entire file at once, and returns
// the parsed records as a SharedArray object.
const csvRecords = await csv.parse(file, { delimiter: ',' })

export default async function() {
	// The csvRecords a SharedArray. Each element is a record from the CSV file, represented as an array
	// where each element is a field from the CSV record.
	//
	// Thus, `csvRecords[scenario.iterationInTest]` will give us the record for the current iteration.
	console.log(csvRecords[scenario.iterationInTest])
}

Before this PR

After this PR

Checklist

I have performed a self-review of my code.
I have commented on my code, particularly in hard-to-understand areas.
I have added tests for my changes.
I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

Please do not merge this PR until the following items are filled out.

I have added the correct milestone and labels to the PR.
I have updated the release notes: link
I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

internal/js/modules/k6/data/data.go

internal/js/modules/k6/data/share_test.go

joanlopez

Great job @oleiade!

Note that none of the comments I left are blocking, but I'd really appreciate if we can "merge" the two "duplicated" methods 🙏🏻

inancgumus · 2026-01-05T15:02:25Z

I've moved this to the 1.6.0 milestone, as we're in the release process.

oleiade · 2026-01-14T14:53:05Z

I've made a pass on the underlying SharedArray code as per your concerns @codebien and found that it's already pretty well tests. However, I've added a bunch of further tests to the data package, in order to limit the risks of breaking anything with our internal changes in this PR. Hope that's satisfying 🙇🏻

codebien

Hey @oleiade,
thanks for let me know.

If I understand the changes correctly, the ones in the data module are orthogonal to the csv part but not really required for the fix of the specific related issue.

The data module is very sensible, so I'd appreciate it if you could split the pull requests by their domains to keep the things clean and easy enough to be reviewed in the future without too much effort. I mean to have one pull request for csv fix and one for the data module.

oleiade · 2026-01-16T15:09:22Z

Happy to split the PR.

However, most of the changes in the data module are dependencies for the CSV fix. The standalone parts (mostly increased test coverage in data_test.go and share_test.go to safeguard SharedArray) are isolated in my last commit.

I realize the data module is sensitive. But this was the best solution I could come up with at the time.

An alternative to the current approach I've been tinkering with might be to extract the 'shared memory' logic into a dedicated package that provides shared data structure used by csv, and that we could extend to further modules that might need that too in the future (fs), to tackle this need to initialize once, reuse multiple times that we tried to address here.. This would decouple it from the data module's specific implementation.

Plus, we could, if we so choose, migrate the data module to the new package down the road too. I think it could make sense, what do you folks think @codebien @joanlopez any other alternatives?

internal/js/modules/k6/data/data.go

Replace map-based JSON marshaling with anonymous structs in buildSharedArrayName to guarantee deterministic field ordering across all Go versions. While encoding/json currently sorts map keys alphabetically, this behavior is an implementation detail not guaranteed by the Go specification. Using structs with explicit field ordering ensures consistent SHA256 hashes for identical file+options combinations, preventing potential SharedArray duplication across VUs in future Go versions.

oleiade · 2026-01-20T11:28:47Z

Hey folks 👋🏻

Thanks a lot for the constructive feedback. I went back back to a less intrusive set of changes, leveraging the mutex as opposed to introduce a new sync.Once+slots based system.

I took the liberty to rewrite the history for clarity too, as I think it serves the review in this specific case.

Also removed the commit adding test coverage to SharedArray, and will open a separate PR 🙇🏻

The csv.parse function was creating a new SharedArray with a random name (based on time.Now().Nanosecond()) for each VU, causing memory usage to scale linearly with VU count instead of sharing data across VUs. For a 40MB CSV file with 100 VUs, this resulted in ~6.5GB memory usage instead of the expected ~200MB. This commit fixes the issue by: 1. Replacing random SharedArray name generation with deterministic hash-based naming. The name is now derived from the file path and parser options (delimiter, skipFirstLine, fromLine, toLine, asObjects) using SHA256, ensuring the same file+options combination always produces the same SharedArray name. 2. Refactoring the internal sharedArrays structure to use a slot-based system with sync.Once guarantees. This ensures the underlying data is initialized exactly once per unique name, even when multiple VUs call csv.parse concurrently. 3. Adding loadOrStore method that uses double-checked locking to minimize contention while ensuring thread-safe single initialization of shared arrays. Closes #5493

codebien

LGTM 🚀 I've just one request.

internal/js/modules/k6/data/data.go

mstoykov

LGTM apart from the if @codebien pointed out

Co-authored-by: Ivan <[email protected]>

oleiade self-assigned this Dec 16, 2025

oleiade requested a review from a team as a code owner December 16, 2025 13:18

oleiade added the bug label Dec 16, 2025

oleiade requested review from codebien, inancgumus and joanlopez and removed request for a team December 16, 2025 13:18

oleiade mentioned this pull request Dec 16, 2025

csv.parse do not share the array between VUs #5493

Closed

oleiade temporarily deployed to azure-trusted-signing December 16, 2025 13:24 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing December 16, 2025 13:26 — with GitHub Actions Inactive

oleiade removed the request for review from inancgumus December 17, 2025 08:25

oleiade temporarily deployed to azure-trusted-signing December 17, 2025 09:25 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing December 17, 2025 09:27 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing December 18, 2025 10:36 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing December 18, 2025 10:38 — with GitHub Actions Inactive

joanlopez reviewed Jan 2, 2026

View reviewed changes

internal/js/modules/k6/data/data.go Outdated Show resolved Hide resolved

joanlopez reviewed Jan 2, 2026

View reviewed changes

internal/js/modules/k6/data/data.go Outdated Show resolved Hide resolved

joanlopez reviewed Jan 2, 2026

View reviewed changes

internal/js/modules/k6/data/share_test.go Outdated Show resolved Hide resolved

joanlopez previously approved these changes Jan 2, 2026

View reviewed changes

inancgumus added this to the v1.6.0 milestone Jan 5, 2026

oleiade dismissed joanlopez’s stale review via e3e7b8c January 13, 2026 14:00

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:07 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:08 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:52 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:54 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 16:36 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 13, 2026 16:38 — with GitHub Actions Inactive

oleiade requested a review from joanlopez January 14, 2026 14:53

codebien requested changes Jan 16, 2026

View reviewed changes

mstoykov reviewed Jan 19, 2026

View reviewed changes

internal/js/modules/k6/data/data.go Outdated Show resolved Hide resolved

mstoykov reviewed Jan 19, 2026

View reviewed changes

internal/js/modules/k6/data/data.go Outdated Show resolved Hide resolved

oleiade force-pushed the fix/csv-parse-sharedarray-naming branch from a372774 to e7e434b Compare January 20, 2026 11:26

oleiade requested review from codebien and mstoykov January 20, 2026 11:28

oleiade temporarily deployed to azure-trusted-signing January 20, 2026 11:32 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 20, 2026 11:35 — with GitHub Actions Inactive

oleiade force-pushed the fix/csv-parse-sharedarray-naming branch from e7e434b to dd5943b Compare January 20, 2026 13:37

Merge branch 'master' into fix/csv-parse-sharedarray-naming

95e75a0

oleiade temporarily deployed to azure-trusted-signing January 20, 2026 13:44 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 20, 2026 13:47 — with GitHub Actions Inactive

Merge branch 'master' into fix/csv-parse-sharedarray-naming

207ed7f

oleiade temporarily deployed to azure-trusted-signing January 28, 2026 13:34 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 28, 2026 13:36 — with GitHub Actions Inactive

codebien reviewed Jan 28, 2026

View reviewed changes

internal/js/modules/k6/data/data.go Outdated Show resolved Hide resolved

mstoykov previously approved these changes Jan 28, 2026

View reviewed changes

Update internal/js/modules/k6/data/data.go

36a7488

Co-authored-by: Ivan <[email protected]>

oleiade dismissed mstoykov’s stale review via 36a7488 January 28, 2026 15:48

oleiade requested a review from codebien January 28, 2026 15:48

oleiade temporarily deployed to azure-trusted-signing January 28, 2026 15:54 — with GitHub Actions Inactive

oleiade temporarily deployed to azure-trusted-signing January 28, 2026 15:56 — with GitHub Actions Inactive

codebien approved these changes Jan 28, 2026

View reviewed changes

mstoykov approved these changes Jan 28, 2026

View reviewed changes

codebien merged commit 5f3aed5 into master Jan 28, 2026
49 checks passed

codebien deleted the fix/csv-parse-sharedarray-naming branch January 28, 2026 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure csv.parse generates a single shared array per file+options #5503

fix: ensure csv.parse generates a single shared array per file+options #5503

oleiade commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joanlopez left a comment

Uh oh!

inancgumus commented Jan 5, 2026

Uh oh!

oleiade commented Jan 14, 2026

Uh oh!

codebien left a comment

Uh oh!

oleiade commented Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

oleiade commented Jan 20, 2026

Uh oh!

codebien left a comment

Uh oh!

Uh oh!

mstoykov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: ensure csv.parse generates a single shared array per file+options #5503

fix: ensure csv.parse generates a single shared array per file+options #5503

Conversation

oleiade commented Dec 16, 2025

What

Demo

Before this PR

After this PR

Checklist

Checklist: Documentation (only for k6 maintainers and if relevant)

Related PR(s)/Issue(s)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joanlopez left a comment

Choose a reason for hiding this comment

Uh oh!

inancgumus commented Jan 5, 2026

Uh oh!

oleiade commented Jan 14, 2026

Uh oh!

codebien left a comment

Choose a reason for hiding this comment

Uh oh!

oleiade commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oleiade commented Jan 20, 2026

Uh oh!

codebien left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mstoykov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

oleiade commented Jan 16, 2026 •

edited

Loading