Skip to content

Dataset not found #11

@remylouisew

Description

@remylouisew

The code provided to prepare the wmt_t2t_translate dataset fails with this error:

File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/translate/wmt.py", line 1000, in _parse_parallel_sentences
assert f1_files and f2_files, "No matching files found: %s, %s." % (f1, f2)
AssertionError: No matching files found: gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.de, gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.en.

The files mentioned do indeed exist at that location, and both myself and the service account used by the Vertex Workbench are able to access the files. Perhaps there is a problem with the config?

In addition, the code to download the xsum dataset that is linked in this tutorial fails due to some syntax error in their creation script. Not exactly your problem, but wanted to let you know. I was able to build the squad and cnn_dailymail datasets successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions