Open
Description
Short description
Some of the language tags in the default TAGS.txt cause a UnicodeDecodeError
.
Environment information
-
Operating System: Windows 11
-
Python version: 3.10.13
-
tensorflow-datasets
/tfds-nightly
version:tfds-nightly
4.9.4.dev202405100044 -
tensorflow
/tf-nightly
version:tensorflow
2.10.0 -
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? Yes
Reproduction instructions
Make a toy dataset with tfds new test
. Then try to instantiate the builder.
from test.test_dataset_builder import *
b = Builder()
Link to logs
Stack trace here
Expected behavior
The builder to instantiate without error.
Additional context
Deleting lines 73, 79, 126, 128, 156, and 173 in TAGS.txt fixes the problem. These are all language tags.
Activity