First of all, thank you for all of your work. This package is proving to be very helpful.
I have come across what appears to be a bug. If I supply an audio file to Voxseg and no voice activity is identified, this ValueError is thrown:
------------------- Running VAD -------------------
2021-06-08 18:15:50.236547: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-08 18:15:50.236961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Traceback (most recent call last):
File "voxseg/main.py", line 58, in <module>
endpoints = run_cnnlstm.decode(targets, speech_thresh, speech_w_music_thresh, filt)
File "../voxseg/env/lib/python3.8/site-packages/voxseg/run_cnnlstm.py", line 57, in decode
((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
File "../voxseg/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5874, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 631, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 673, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "../voxseg/env/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1074, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 619, in pandas._libs.lib.astype_intsafe
ValueError: cannot convert float NaN to integer
I suspect that because no voice activity has been identified, no time points exist or they are NaN values (i.e. targets['start'] and targets['end']), causing the following code to fail:
From voxseg.run_cnnlstm.decode
targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)
I have put together a workaround but figured others will likely come across this bug at some point. I also would like to know if this bug is due to some other cause than the lack of voice activity.
Many thanks!
First of all, thank you for all of your work. This package is proving to be very helpful.
I have come across what appears to be a bug. If I supply an audio file to Voxseg and no voice activity is identified, this ValueError is thrown:
I suspect that because no voice activity has been identified, no time points exist or they are NaN values (i.e. targets['start'] and targets['end']), causing the following code to fail:
From voxseg.run_cnnlstm.decode
I have put together a workaround but figured others will likely come across this bug at some point. I also would like to know if this bug is due to some other cause than the lack of voice activity.
Many thanks!