Skip to content

Broken CSV file #12284

@diesieben07

Description

@diesieben07

What

Trying to parse the Open Food Facts CSV export (https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz) using Pandas I am getting the following warning:

Skipping line 1803055: expected 214 fields, saw 244

Steps to reproduce the behavior

  1. Download the Open Food Facts CSV
  2. Parse it using Pandas (code below)

Expected behavior

CSV should be properly parsable without errors.

Additional context

My code:

def try_pandas_with_errors():
    df_it: Iterable[DataFrame] = pd.read_csv(
        "/home/take/Downloads/en.openfoodfacts.org.products.csv.gz",
        compression='gzip',
        sep='\t',
        chunksize=2000,
        header=0,
        dtype='string',
        engine='c',
        on_bad_lines='warn',
    )
    for df in tqdm(df_it):
        pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    Data exportWe export data nightly as CSV, MongoDB… See: https://world.openfoodfacts.org/data🎯 P1

    Projects

    Status

    To discuss and validate

    Status

    Backlog

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions