Skip to content

Preprocessing with CVSS-C dataset #19

@Sabo-99

Description

@Sabo-99

First of all, thanks to share the code of your grate work!

But, i encountered several problems while following the preprocess_scripts process with CVSS-C dataset.

First Question)

While running "prep_cvss_c_multilingual_data.py" in "2. prep_cvss_c_multilingual_data.sh", the error is reported.
It was related to the "gcmvn_feature_list" because it is empty!

Below is the printed result of gcmvn_feature_list (by using print(f"gcmvn_feature_list: {gcmvn_feature_list}")) and the reported error message.

gcmvn_feature_list: []

Traceback (most recent call last):
File "./simulst/StreamSpeech/preprocess_scripts/prep_cvss_c_multilingual_data.py", line 634, in
main()
File "./simulst/StreamSpeech/preprocess_scripts/prep_cvss_c_multilingual_data.py", line 631, in main
process(args)
File "./simulst/StreamSpeech/preprocess_scripts/prep_cvss_c_multilingual_data.py", line 355, in process
stats = cal_gcmvn_stats(gcmvn_feature_list)
File "./simulst/StreamSpeech/fairseq/examples/speech_to_text/data_utils.py", line 268, in cal_gcmvn_stats
features = np.concatenate(features_list)
ValueError: need at least one array to concatenate
finish 2.prep_cvss_c_multilingual_data.sh

Second Question)

Also, other stages (4~9) reports the errors related to the fank2unit such as belows:

FileNotFoundError: [Errno 2] No such file or directory: './simulst/dataset/cvss/cvss-c/es-en/fbank2unit/dev.tsv'
FileNotFoundError: [Errno 2] No such file or directory: './simulst/dataset/cvss/cvss-c/es-en/fbank2unit/train.tsv'
FileNotFoundError: [Errno 2] No such file or directory: './simulst/dataset/cvss/cvss-c/es-en/fbank2unit/test.tsv'

Should I configure the fbank2unit directory myself, or is it created automatically when running preprocess.sh?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions