-
-
Notifications
You must be signed in to change notification settings - Fork 564
Labels
Data exportWe export data nightly as CSV, MongoDB… See: https://world.openfoodfacts.org/dataWe export data nightly as CSV, MongoDB… See: https://world.openfoodfacts.org/data🎯 P1
Description
What
Trying to parse the Open Food Facts CSV export (https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz) using Pandas I am getting the following warning:
Skipping line 1803055: expected 214 fields, saw 244
Steps to reproduce the behavior
- Download the Open Food Facts CSV
- Parse it using Pandas (code below)
Expected behavior
CSV should be properly parsable without errors.
Additional context
My code:
def try_pandas_with_errors():
df_it: Iterable[DataFrame] = pd.read_csv(
"/home/take/Downloads/en.openfoodfacts.org.products.csv.gz",
compression='gzip',
sep='\t',
chunksize=2000,
header=0,
dtype='string',
engine='c',
on_bad_lines='warn',
)
for df in tqdm(df_it):
passReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Data exportWe export data nightly as CSV, MongoDB… See: https://world.openfoodfacts.org/dataWe export data nightly as CSV, MongoDB… See: https://world.openfoodfacts.org/data🎯 P1
Type
Projects
Status
To discuss and validate
Status
Backlog
Status
No status