Skip to content

Bug: Uncaught exception "_csv.Error: field larger than field limit (131072)" #41

Open
@corneliusroemer

Description

@corneliusroemer

I'm diffing csvs that contain nucleotide sequences in some columns and they can get rather long, e.g. ~200k characters.

This causes csv-diff to crash with an uncaught exception:

csv-diff results/mpox_prod.invariant.seq.tsv results/mpox_staging.invariant.seq.tsv --key=submissionId --format=tsv
Traceback (most recent call last):
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/bin/csv-diff", line 10, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/cli.py", line 73, in cli
    previous_data = load(previous)
                    ^^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/cli.py", line 69, in load
    return load_csv(
           ^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/pp-integrity/lib/python3.12/site-packages/csv_diff/__init__.py", line 19, in load_csv
    rows = [dict(zip(headings, line)) for line in fp]
                                                  ^^
_csv.Error: field larger than field limit (131072)

You could allow us to work with ~200k long columns by setting:

csv.field_size_limit(int(ct.c_ulong(-1).value // 2))

(from https://stackoverflow.com/a/54517228/7483211)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions