Skip to content

[BUG] parseStream() does not work as expected when the is invalid rows #1095

Open
@ajay-psd

Description

@ajay-psd

Describe the bug
Parsing a CSV from s3 bucket data file containing an invalid row, parsing immediately stopped when the stream(data chunks ) contained an invalid row. the expected behavior is to continue parsing upon encountering an invalid-data

`"xxhTT5lwV","[email protected]","919582162103","62103"22","621","Samsung"

sample invalid data entry in the file

Parsing works with data files that do not have invalid data, But it fails with invalid data.

for handling invalid data it is not adhering to Options passed as
{ header: true, strictColumnHandling: true, discardUnmappedColumns: true}

Parsing or Formatting?

  • Parsing

The code snippet that I have used.

const filename = 'uploads/FYAP6BA7/my_file-1738837410345-372923559.csv'; const bucket = 'test-app' var params = { Bucket: bucket, Key: filename };
const stream = s3.getObject(params).createReadStream()

csv.parseStream(stream, { header: true, strictColumnHandling: true, discardUnmappedColumns: true})
.on('error', error => console.error(error))
.on('data', row => console.log(JSON.stringify(row)))
.on('end', rowCount => console.log(rowCount}))
.on('data-invalid', row => console.log(JSON.stringify(row)}))

sample data in a s3 file
"name","email","sms","token","code","label"
"xxhTT5lwV","[email protected]","919582162103","62103"22","621","Samsung" "0ENexk5cj","[email protected]","919194880301","621034"444","803","Testing" "NOG1TQ8Cz","[email protected]","919619960375","621035555","603","Sony" "gcscciBK8","[email protected]","919037631672","621036666","316","LG"

first and second row has invalid data with extra quote

Parsing works with data files that do not have invalid data, But it fails with invalid data.

Expected behavior
the expected behavior is to continue parsing upon encountering invalid data, as per the documentation invalid data should handled ina data-invalid event and parsing will continue.

Image

Screenshots

Image

Desktop (please complete the following information):

  • OS: MacOS
  • OS Version: Sequoia 15.2
  • Node Version v22.12.0
  • fast-csv version : "^5.0.0" and ^5.0.2

Is any anything I am doing wrong or missing something in the code?

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions