Closed
Description
Describe the bug
I can't retrieve a cached dataset with offline mode enabled
Steps to reproduce the bug
To reproduce my issue, first, you'll need to run a script that will cache the dataset
import os
os.environ["HF_DATASETS_OFFLINE"] = "0"
import datasets
datasets.logging.set_verbosity_info()
ds_name = "SaulLu/toy_struc_dataset"
ds = datasets.load_dataset(ds_name)
print(ds)
then, you can try to reload it in offline mode:
import os
os.environ["HF_DATASETS_OFFLINE"] = "1"
import datasets
datasets.logging.set_verbosity_info()
ds_name = "SaulLu/toy_struc_dataset"
ds = datasets.load_dataset(ds_name)
print(ds)
Expected results
I would have expected the 2nd snippet not to return any errors
Actual results
The 2nd snippet returns:
Traceback (most recent call last):
File "/home/lucile_huggingface_co/sandbox/evaluate/test_cache_datasets.py", line 8, in <module>
ds = datasets.load_dataset(ds_name)
File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
builder_instance = load_dataset_builder(
File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1241, in dataset_module_factory
raise ConnectionError(f"Couln't reach the Hugging Face Hub for dataset '{path}': {e1}") from None
ConnectionError: Couln't reach the Hugging Face Hub for dataset 'SaulLu/toy_struc_dataset': Offline mode is enabled.
Environment info
datasets
version: 2.4.0- Platform: Linux-4.19.0-21-cloud-amd64-x86_64-with-glibc2.17
- Python version: 3.8.13
- PyArrow version: 8.0.0
- Pandas version: 1.4.3
Maybe I'm misunderstanding something in the use of the offline mode (see doc), is that the case?