Too many columns specified error with USCMed dataset's annotations

Currently, when running ContourUSV with the USCMed dataset, an error occurs in the "generate annotations" step. This step formats the existing ground truth CSV files found in the dataset to be in the appropriate format (column headers) for the evaluation step. The error traceback is shown below:

```
Traceback (most recent call last):
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/main.py", line 375, in <module>
    generate_annotations(experiment, trial, root_path, file_ext)
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 189, in generate_annotations
    save_annotations(matched_csv, audio_file, output_path, file_ext)
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 109, in save_annotations
    usv_data = loaders[file_ext](f)
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/Desktop/VSCode/github_tests/contourusv/contourusv/generate_annotation.py", line 80, in load_csv_usv
    data = pd.read_csv(file_name, header=None, names=['begin_time', 'end_time'], usecols=[0, 1])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evana_anis/anaconda3/envs/research/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 983, in pandas._libs.parsers.TextReader._convert_column_data
pandas.errors.ParserError: Too many columns specified: expected 2 and found 1
```

From the traceback, we can see that the line that causes this issue is line 80 in `generation.py`:
`data = pd.read_csv(file_name, header=None, names=['begin_time', 'end_time'], usecols=[0, 1])`


The issue can be resolved by updating the line with:
`data = pd.read_csv(file_name, header=None, skiprows=1, names=['begin_time', 'end_time'], sep='\t', usecols=[1, 2])`

This solution is only specific to the USCMed dataset's ground truth annotations (based on the current column structure).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Too many columns specified error with USCMed dataset's annotations #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Too many columns specified error with USCMed dataset's annotations #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions