Skip to content

Conversation

@oleiade
Copy link
Contributor

@oleiade oleiade commented Dec 16, 2025

What

The csv.parse function was creating a new SharedArray with a random name (based on time.Now().Nanosecond()) for each VU, causing memory usage to scale linearly with VU count instead of sharing data across VUs. For a 40MB CSV file with 100 VUs, this resulted in ~6.5GB memory usage instead of the expected ~200MB.

This commit fixes the issue by:

  1. Replacing random SharedArray name generation with deterministic hash-based naming. The name is now derived from the file path and parser options (delimiter, skipFirstLine, fromLine, toLine, asObjects) using SHA256, ensuring the same file+options combination always produces the same SharedArray name.
  2. Refactoring the internal sharedArrays structure to use a slot-based system with sync.Once guarantees. This ensures the underlying data is initialized exactly once per unique name, even when multiple VUs call csv.parse concurrently.
  3. Adding loadOrStore method that uses double-checked locking to minimize contention while ensuring thread-safe single initialization of shared arrays.

Closes #5493

Demo

Given a 40MB csv file, and the following script:

import { open } from 'k6/experimental/fs'
import csv from 'k6/experimental/csv'
import { scenario } from 'k6/execution'

export const options = {
    vus: 20,
	duration: '10m',
}

// Open the csv file, and parse it ahead of time.
const file = await open('data.csv');
// The `csv.parse` function consumes the entire file at once, and returns
// the parsed records as a SharedArray object.
const csvRecords = await csv.parse(file, { delimiter: ',' })

export default async function() {
	// The csvRecords a SharedArray. Each element is a record from the CSV file, represented as an array
	// where each element is a field from the CSV record.
	//
	// Thus, `csvRecords[scenario.iterationInTest]` will give us the record for the current iteration.
	console.log(csvRecords[scenario.iterationInTest])
}

Before this PR

CleanShot 2025-12-16 at 11 00 37

After this PR

CleanShot 2025-12-16 at 10 58 23

Checklist

  • I have performed a self-review of my code.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests for my changes.
  • I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

Please do not merge this PR until the following items are filled out.

  • I have added the correct milestone and labels to the PR.
  • I have updated the release notes: link
  • I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
  • I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

@oleiade oleiade self-assigned this Dec 16, 2025
@oleiade oleiade requested a review from a team as a code owner December 16, 2025 13:18
@oleiade oleiade added the bug label Dec 16, 2025
@oleiade oleiade requested review from codebien, inancgumus and joanlopez and removed request for a team December 16, 2025 13:18
@oleiade oleiade temporarily deployed to azure-trusted-signing December 16, 2025 13:24 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing December 16, 2025 13:26 — with GitHub Actions Inactive
@oleiade oleiade removed the request for review from inancgumus December 17, 2025 08:25
@oleiade oleiade temporarily deployed to azure-trusted-signing December 17, 2025 09:25 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing December 17, 2025 09:27 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing December 18, 2025 10:36 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing December 18, 2025 10:38 — with GitHub Actions Inactive
joanlopez
joanlopez previously approved these changes Jan 2, 2026
Copy link
Contributor

@joanlopez joanlopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @oleiade!

Note that none of the comments I left are blocking, but I'd really appreciate if we can "merge" the two "duplicated" methods 🙏🏻

@inancgumus inancgumus added this to the v1.6.0 milestone Jan 5, 2026
@inancgumus
Copy link
Contributor

I've moved this to the 1.6.0 milestone, as we're in the release process.

@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:07 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:08 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:52 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 14:54 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 16:36 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 13, 2026 16:38 — with GitHub Actions Inactive
@oleiade
Copy link
Contributor Author

oleiade commented Jan 14, 2026

I've made a pass on the underlying SharedArray code as per your concerns @codebien and found that it's already pretty well tests. However, I've added a bunch of further tests to the data package, in order to limit the risks of breaking anything with our internal changes in this PR. Hope that's satisfying 🙇🏻

@oleiade oleiade requested a review from joanlopez January 14, 2026 14:53
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @oleiade,
thanks for let me know.

If I understand the changes correctly, the ones in the data module are orthogonal to the csv part but not really required for the fix of the specific related issue.

The data module is very sensible, so I'd appreciate it if you could split the pull requests by their domains to keep the things clean and easy enough to be reviewed in the future without too much effort. I mean to have one pull request for csv fix and one for the data module.

@oleiade
Copy link
Contributor Author

oleiade commented Jan 16, 2026

Happy to split the PR.

However, most of the changes in the data module are dependencies for the CSV fix. The standalone parts (mostly increased test coverage in data_test.go and share_test.go to safeguard SharedArray) are isolated in my last commit.

I realize the data module is sensitive. But this was the best solution I could come up with at the time.

An alternative to the current approach I've been tinkering with might be to extract the 'shared memory' logic into a dedicated package that provides shared data structure used by csv, and that we could extend to further modules that might need that too in the future (fs), to tackle this need to initialize once, reuse multiple times that we tried to address here.. This would decouple it from the data module's specific implementation.

Plus, we could, if we so choose, migrate the data module to the new package down the road too. I think it could make sense, what do you folks think @codebien @joanlopez any other alternatives?

Replace map-based JSON marshaling with anonymous structs in
buildSharedArrayName to guarantee deterministic field ordering
across all Go versions.

While encoding/json currently sorts map keys alphabetically, this
behavior is an implementation detail not guaranteed by the Go
specification. Using structs with explicit field ordering ensures
consistent SHA256 hashes for identical file+options combinations,
preventing potential SharedArray duplication across VUs in future
Go versions.
@oleiade oleiade force-pushed the fix/csv-parse-sharedarray-naming branch from a372774 to e7e434b Compare January 20, 2026 11:26
@oleiade
Copy link
Contributor Author

oleiade commented Jan 20, 2026

Hey folks 👋🏻

Thanks a lot for the constructive feedback. I went back back to a less intrusive set of changes, leveraging the mutex as opposed to introduce a new sync.Once+slots based system.

I took the liberty to rewrite the history for clarity too, as I think it serves the review in this specific case.

Also removed the commit adding test coverage to SharedArray, and will open a separate PR 🙇🏻

@oleiade oleiade requested review from codebien and mstoykov January 20, 2026 11:28
@oleiade oleiade temporarily deployed to azure-trusted-signing January 20, 2026 11:32 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 20, 2026 11:35 — with GitHub Actions Inactive
The csv.parse function was creating a new SharedArray with a random name
(based on time.Now().Nanosecond()) for each VU, causing memory usage to
scale linearly with VU count instead of sharing data across VUs.

For a 40MB CSV file with 100 VUs, this resulted in ~6.5GB memory usage
instead of the expected ~200MB.

This commit fixes the issue by:

  1. Replacing random SharedArray name generation with deterministic
     hash-based naming. The name is now derived from the file path and
     parser options (delimiter, skipFirstLine, fromLine, toLine,
     asObjects) using SHA256, ensuring the same file+options combination
     always produces the same SharedArray name.

  2. Refactoring the internal sharedArrays structure to use a slot-based
     system with sync.Once guarantees. This ensures the underlying data
     is initialized exactly once per unique name, even when multiple VUs
     call csv.parse concurrently.

  3. Adding loadOrStore method that uses double-checked locking to
     minimize contention while ensuring thread-safe single initialization
     of shared arrays.

  Closes #5493
@oleiade oleiade force-pushed the fix/csv-parse-sharedarray-naming branch from e7e434b to dd5943b Compare January 20, 2026 13:37
@oleiade oleiade temporarily deployed to azure-trusted-signing January 20, 2026 13:44 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 20, 2026 13:47 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 28, 2026 13:34 — with GitHub Actions Inactive
@oleiade oleiade temporarily deployed to azure-trusted-signing January 28, 2026 13:36 — with GitHub Actions Inactive
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 I've just one request.

mstoykov
mstoykov previously approved these changes Jan 28, 2026
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the if @codebien pointed out

@codebien codebien merged commit 5f3aed5 into master Jan 28, 2026
49 checks passed
@codebien codebien deleted the fix/csv-parse-sharedarray-naming branch January 28, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

csv.parse do not share the array between VUs

5 participants