Open
Description
We've been using python-dwca-reader
with no problems loading about 13k occurrences. We now need to scale it up to load about 3.25m occurrences.
Changing the code from:
core_df = dwca.pd_read('occurrence.txt', parse_dates=True)
to:
for chunk in dwca.pd_read('occurrence.txt', parse_dates=True, chunksize=10):
...
causes the error:
...
for chunk in dwca.pd_read('occurrence.txt', parse_dates=True, chunksize=10):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/opt/asdf/installs/python/3.11.7/lib/python3.11/site-packages/dwca/read.py", line 209, in pd_read
df[shorten_term(field['term'])] = field_default_value
~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'TextFileReader' object does not support item assignment
Looking at gbif-alert
, I see that you're using enumerate(dwca)
rather than reading it in chunks, so I'll give that a try.
Metadata
Assignees
Labels
No labels
Activity