Skip to content

ValueError raised when audio file has no voice activity #3

@a-n-rose

Description

@a-n-rose

First of all, thank you for all of your work. This package is proving to be very helpful.

I have come across what appears to be a bug. If I supply an audio file to Voxseg and no voice activity is identified, this ValueError is thrown:

------------------- Running VAD -------------------
2021-06-08 18:15:50.236547: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-08 18:15:50.236961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Traceback (most recent call last):
  File "voxseg/main.py", line 58, in <module>
    endpoints = run_cnnlstm.decode(targets, speech_thresh, speech_w_music_thresh, filt)
  File "../voxseg/env/lib/python3.8/site-packages/voxseg/run_cnnlstm.py", line 57, in decode
    ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5874, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 631, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 673, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1074, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 619, in pandas._libs.lib.astype_intsafe
ValueError: cannot convert float NaN to integer

I suspect that because no voice activity has been identified, no time points exist or they are NaN values (i.e. targets['start'] and targets['end']), causing the following code to fail:

From voxseg.run_cnnlstm.decode

    targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
                        ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
                        ((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)

I have put together a workaround but figured others will likely come across this bug at some point. I also would like to know if this bug is due to some other cause than the lack of voice activity.

Many thanks!

Metadata

Metadata

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions