Skip to content

Too many columns specified error with USCMed dataset's annotations #7

@Sabah98

Description

@Sabah98

Currently, when running ContourUSV with the USCMed dataset, an error occurs in the "generate annotations" step. This step formats the existing ground truth CSV files found in the dataset to be in the appropriate format (column headers) for the evaluation step. The error traceback is shown below:

Traceback (most recent call last):
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/main.py", line 375, in <module>
    generate_annotations(experiment, trial, root_path, file_ext)
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 189, in generate_annotations
    save_annotations(matched_csv, audio_file, output_path, file_ext)
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 109, in save_annotations
    usv_data = loaders[file_ext](f)
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 80, in load_csv_usv
    data = pd.read_csv(file_name, header=None, names=['begin_time', 'end_time'], usecols=[0, 1])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 983, in pandas._libs.parsers.TextReader._convert_column_data
pandas.errors.ParserError: Too many columns specified: expected 2 and found 1

From the traceback, we can see that the line that causes this issue is line 80 in generation.py:
data = pd.read_csv(file_name, header=None, names=['begin_time', 'end_time'], usecols=[0, 1])

The issue can be resolved by updating the line with:
data = pd.read_csv(file_name, header=None, skiprows=1, names=['begin_time', 'end_time'], sep='\t', usecols=[1, 2])

This solution is only specific to the USCMed dataset's ground truth annotations (based on the current column structure).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions