Dataset not found

The code provided to prepare the wmt_t2t_translate dataset fails with this error:

  File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/translate/wmt.py", line 1000, in _parse_parallel_sentences
    assert f1_files and f2_files, "No matching files found: %s, %s." % (f1, f2)
AssertionError: No matching files found: gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.de, gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.en.

The files mentioned do indeed exist at that location, and both myself and the service account used by the Vertex Workbench are able to access the files. Perhaps there is a problem with the config?

In addition, the code to download the xsum dataset that is linked in this tutorial fails due to some syntax error in their creation script. Not exactly your problem, but wanted to let you know. I was able to build the squad and cnn_dailymail datasets successfully.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset not found #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset not found #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions