Skip to content

fix(csv): strip leading byte-order mark in CsvParseStream#7183

Open
LeSingh1 wants to merge 1 commit into
denoland:mainfrom
LeSingh1:fix/csv-parse-stream-bom
Open

fix(csv): strip leading byte-order mark in CsvParseStream#7183
LeSingh1 wants to merge 1 commit into
denoland:mainfrom
LeSingh1:fix/csv-parse-stream-bom

Conversation

@LeSingh1

Copy link
Copy Markdown

The synchronous parse() function already strips a leading UTF-8
byte-order mark (U+FEFF) from its input, but CsvParseStream did not.

When a CSV file begins with a BOM -- common output from Excel and other
Windows tools -- the first field name arrives as "name" instead
of "name". That corrupts header-based lookups silently:

const source = ReadableStream.from(["name,age\n", "Alice,34\n"]);
const records = await Array.fromAsync(
  source.pipeThrough(new CsvParseStream({ skipFirstRow: true })),
);
// before this fix:
// [{ "name": "Alice", age: "34" }]   -- BOM leaks into key
// after:
// [{ name: "Alice", age: "34" }]

The fix adds a #firstLine flag to StreamLineReader and strips the
BOM from the first line it reads, exactly matching what parse() does
via its BYTE_ORDER_MARK constant.

Two new tests cover the regression: one for plain string[][] output and
one for skipFirstRow: true (object output, where the BOM corrupts the
header key).

parse() already strips a leading BOM from its input string, but
CsvParseStream left it intact. When a UTF-8 CSV file starts with a BOM
(common output of tools like Excel), the first field name would arrive as
"name" instead of "name", corrupting headers and key lookups.

StreamLineReader now strips the BOM from the first line it reads,
matching the existing parse() behaviour exactly.
@github-actions github-actions Bot added the csv label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.57%. Comparing base (cdf74a8) to head (9e34775).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7183      +/-   ##
==========================================
- Coverage   94.57%   94.57%   -0.01%     
==========================================
  Files         636      637       +1     
  Lines       52142    52159      +17     
  Branches     9401     9403       +2     
==========================================
+ Hits        49315    49328      +13     
- Misses       2249     2254       +5     
+ Partials      578      577       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

greymoth-jp added a commit to greymoth-jp/cjk-failure-corpus that referenced this pull request Jun 29, 2026
Add four greymoth PRs and one cited upstream PR, all verified open via the
GitHub API:
- BasedHardware/omi#8601 — onboarding answer gate counts a spaceless CJK
  answer as one word (segmentation)
- emdash-cms/emdash#1661 — editor footer word count / reading time splits
  on spaces, so a CJK paragraph reads as 1 word (segmentation)
- validatorjs/validator.js#2789 — isAlphanumeric el-GR range omits accented
  Greek that isAlpha accepts (unicode-range)
- date-fns/date-fns#4231 — Galician formats June as the wide form but cannot
  parse it back; the pattern stops before the tilde (locale-data)
- denoland/std#7183 — CsvParseStream leaves a leading BOM on the first header
  key while sync parse() strips it (encoding; cited, not greymoth-authored)

Three new categories: segmentation, unicode-range, encoding.
Status re-sync against the API: zag color-picker channel IME guard merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant