Open
Description
I'm diffing csvs that contain nucleotide sequences in some columns and they can get rather long, e.g. ~200k characters.
This causes csv-diff to crash with an uncaught exception:
csv-diff results/mpox_prod.invariant.seq.tsv results/mpox_staging.invariant.seq.tsv --key=submissionId --format=tsv
Traceback (most recent call last):
File "/Users/corneliusromer/micromamba/envs/pp-integrity/bin/csv-diff", line 10, in <module>
sys.exit(cli())
^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/cli.py", line 73, in cli
previous_data = load(previous)
^^^^^^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/cli.py", line 69, in load
return load_csv(
^^^^^^^^^
File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/__init__.py", line 19, in load_csv
rows = [dict(zip(headings, line)) for line in fp]
^^
_csv.Error: field larger than field limit (131072)
You could allow us to work with ~200k long columns by setting:
csv.field_size_limit(int(ct.c_ulong(-1).value // 2))
Metadata
Metadata
Assignees
Labels
No labels