-
Notifications
You must be signed in to change notification settings - Fork 12
Description
The code provided to prepare the wmt_t2t_translate dataset fails with this error:
File "/opt/conda/lib/python3.7/site-packages/tensorflow_datasets/translate/wmt.py", line 1000, in _parse_parallel_sentences
assert f1_files and f2_files, "No matching files found: %s, %s." % (f1, f2)
AssertionError: No matching files found: gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.de, gs://rw-tpu/datasets/downloads/extracted/TAR_GZ.statmt.org_wmt13_traini-parall-europa-v7AiTHxxDIoGPf2JOwzAgwIC1h9MdcF-uOMYNhA9J9luc.tgz/training/europarl-v7.de-en.en.
The files mentioned do indeed exist at that location, and both myself and the service account used by the Vertex Workbench are able to access the files. Perhaps there is a problem with the config?
In addition, the code to download the xsum dataset that is linked in this tutorial fails due to some syntax error in their creation script. Not exactly your problem, but wanted to let you know. I was able to build the squad and cnn_dailymail datasets successfully.