- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.9k
 
Open
Labels
bugSomething isn’t working correctlySomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsIssues regarding data inputs, processing, or datasetspythonPull requests that update python codePull requests that update python code
Description
System Info
- lerobot version: 0.4.0
- Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.35
- Python version: 3.10.18
- Huggingface Hub version: 0.35.3
- Datasets version: 4.1.1
- Numpy version: 2.2.6
- PyTorch version: 2.7.1+cu126
- Is PyTorch built with CUDA support?: True
- Cuda version: 12.6
- GPU model: NVIDIA RTX A4500Information
- One of the scripts in the examples/ folder of LeRobot
 - My own task or dataset (give details below)
 
Reproduction
Steps to reproduce the behaviour:
uv init --python 3.10 --package hello-lerobotcd hello-lerobot && uv syncsource .venv/bin/activateuv add lerobot- Open a python file/notebook and type in the contents from the 
StreamingLeRobotDatasetexample given here 
from lerobot.datasets.streaming_dataset import StreamingLeRobotDataset
repo_id = "yaak-ai/L2D-v3"
dataset = StreamingLeRobotDataset(repo_id)  # streams directly from the Hub- Run the file/cell
 
Expected behavior
The dataset object creation line has an error, as shown below:
---------------------------------------------------------------------------
DataFilesNotFoundError                    Traceback (most recent call last)
Cell In[7], line 4
      1 from lerobot.datasets.streaming_dataset import StreamingLeRobotDataset
      3 repo_id = "yaak-ai/L2D-v3"
----> 4 dataset = StreamingLeRobotDataset(repo_id)  # streams directly from the Hub
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/lerobot/datasets/streaming_dataset.py:152, in StreamingLeRobotDataset.__init__(self, repo_id, root, episodes, image_transforms, delta_timestamps, tolerance_s, revision, force_cache_sync, streaming, buffer_size, max_num_shards, seed, rng, shuffle)
    149     self.delta_timestamps = delta_timestamps
    150     self.delta_indices = get_delta_indices(self.delta_timestamps, self.fps)
--> 152 self.hf_dataset: datasets.IterableDataset = load_dataset(
    153     self.repo_id if not self.streaming_from_local else str(self.root),
    154     split="train",
    155     streaming=self.streaming,
    156     data_files="data/*/*.parquet",
    157     revision=self.revision,
    158 )
    160 self.num_shards = min(self.hf_dataset.num_shards, max_num_shards)
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:1392, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, keep_in_memory, save_infos, revision, token, streaming, num_proc, storage_options, **config_kwargs)
   1387 verification_mode = VerificationMode(
   1388     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
   1389 )
   1391 # Create a dataset builder
-> 1392 builder_instance = load_dataset_builder(
   1393     path=path,
   1394     name=name,
   1395     data_dir=data_dir,
   1396     data_files=data_files,
   1397     cache_dir=cache_dir,
   1398     features=features,
   1399     download_config=download_config,
   1400     download_mode=download_mode,
   1401     revision=revision,
   1402     token=token,
   1403     storage_options=storage_options,
   1404     **config_kwargs,
   1405 )
   1407 # Return iterable dataset in case of streaming
   1408 if streaming:
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:1132, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, storage_options, **config_kwargs)
   1130 if features is not None:
   1131     features = _fix_for_backward_compatible_features(features)
-> 1132 dataset_module = dataset_module_factory(
   1133     path,
   1134     revision=revision,
   1135     download_config=download_config,
   1136     download_mode=download_mode,
   1137     data_dir=data_dir,
   1138     data_files=data_files,
   1139     cache_dir=cache_dir,
   1140 )
   1141 # Get dataset builder class
   1142 builder_kwargs = dataset_module.builder_kwargs
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:1025, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)
   1023     raise ConnectionError(f"Couldn't reach the Hugging Face Hub for dataset '{path}': {e1}") from None
   1024 if isinstance(e1, (DataFilesNotFoundError, DatasetNotFoundError, EmptyDatasetError)):
-> 1025     raise e1 from None
   1026 if isinstance(e1, FileNotFoundError):
   1027     raise FileNotFoundError(
   1028         f"Couldn't find any data file at {relative_to_absolute_path(path)}. "
   1029         f"Couldn't find '{path}' on the Hugging Face Hub either: {type(e1).__name__}: {e1}"
   1030     ) from None
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:1004, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)
    994     else:
    995         use_exported_dataset_infos = True
    996     return HubDatasetModuleFactory(
    997         path,
    998         commit_hash=commit_hash,
    999         data_dir=data_dir,
   1000         data_files=data_files,
   1001         download_config=download_config,
   1002         download_mode=download_mode,
   1003         use_exported_dataset_infos=use_exported_dataset_infos,
-> 1004     ).get_module()
   1005 except GatedRepoError as e:
   1006     message = f"Dataset '{path}' is a gated dataset on the Hub."
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:638, in HubDatasetModuleFactory.get_module(self)
    631     patterns = get_data_patterns(base_path, download_config=self.download_config)
    632 data_files = DataFilesDict.from_patterns(
    633     patterns,
    634     base_path=base_path,
    635     allowed_extensions=ALL_ALLOWED_EXTENSIONS,
    636     download_config=self.download_config,
    637 )
--> 638 module_name, default_builder_kwargs = infer_module_for_data_files(
    639     data_files=data_files,
    640     path=self.name,
    641     download_config=self.download_config,
    642 )
    643 data_files = data_files.filter(
    644     extensions=_MODULE_TO_EXTENSIONS[module_name], file_names=_MODULE_TO_METADATA_FILE_NAMES[module_name]
    645 )
    646 module_path, _ = _PACKAGED_DATASETS_MODULES[module_name]
File /mnt/toshiba_hdd/simulations/hello-lerobot/.venv/lib/python3.10/site-packages/datasets/load.py:300, in infer_module_for_data_files(data_files, path, download_config)
    298     raise ValueError(f"Couldn't infer the same data file format for all splits. Got {split_modules}")
    299 if not module_name:
--> 300     raise DataFilesNotFoundError("No (supported) data files found" + (f" in {path}" if path else ""))
    301 return module_name, default_builder_kwargs
DataFilesNotFoundError: No (supported) data files found in yaak-ai/L2D-v3- Seems like the program expects to see some files on my local system
 - However, I am using the dataset streaming feature as I cannot download large datasets to my machine
 
Metadata
Metadata
Assignees
Labels
bugSomething isn’t working correctlySomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsIssues regarding data inputs, processing, or datasetspythonPull requests that update python codePull requests that update python code