-
Notifications
You must be signed in to change notification settings - Fork 425
Open
Description
follow the setup, python train.py can not load dataset, raise error
--- Train Config ---
TrainConfig(lr_mp=0.00512, lr_vision_backbone=5e-05, lr_language_backbone=5e-05, val_size=50000, batch_size=2, gradient_accumulation_steps=8, max_grad_norm=1.0, eval_in_epochs=True, eval_interval=500, stats_log_interval=100, max_training_steps=40000, max_images_per_example=4, max_images_per_knapsack=18, max_sample_length=4096, compile=False, resume_from_vlm_checkpoint=False, train_dataset_path='HuggingFaceM4/FineVision_concat_shuffled_2', train_dataset_name=('default',), stream_dataset=True, relevance_min_rating=1, image_correspondence_min_rating=1, visual_dependency_min_rating=1, formatting_min_rating=1, wandb_entity='qsh-team', log_wandb=True, use_lmms_eval=True, lmms_eval_tasks='mmstar,mmmu_val,ocrbench,textvqa_val,docvqa_val,scienceqa,mme,infovqa_val,chartqa', lmms_eval_limit=None, lmms_eval_batch_size=64)
Getting dataloaders from HuggingFaceM4/FineVision_concat_shuffled_2
Resize to max side len: True
Loading dataset: default
Warning: Failed to load dataset config 'default' from 'HuggingFaceM4/FineVision_concat_shuffled_2'. Error: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 38f977ad-6afb-4cda-84c2-a842cb1f9ca9)')
Traceback (most recent call last):
File "/mnt/scratch/plays/nanoVLM/train.py", line 702, in <module>
main()
File "/mnt/scratch/plays/nanoVLM/train.py", line 696, in main
train(train_cfg, vlm_cfg)
File "/mnt/scratch/plays/nanoVLM/train.py", line 265, in train
train_loader, val_loader, iter_train_loader, iter_val_loader = get_dataloaders(train_cfg, vlm_cfg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/scratch/plays/nanoVLM/train.py", line 155, in get_dataloaders
raise ValueError("No valid datasets were loaded. Please check your dataset path and configurations.")
ValueError: No valid datasets were loaded. Please check your dataset path and configurations.
while I try to use another config
train_dataset_path: str = 'HuggingFaceM4/FineVision'
train_dataset_name: tuple[str, ...] = ("LLaVA_Instruct_150K", ) it complains
Loading dataset: LLaVA_Instruct_150K
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52/52 [00:02<00:00, 23.01it/s]
Warning: Failed to load dataset config 'LLaVA_Instruct_150K' from 'HuggingFaceM4/FineVision'. Error: BuilderConfig ParquetConfig(name='LLaVA_Instruct_150K', version=0.0.0, data_dir=None, data_files={'train': ['LLaVA_Instruct_150K/train-*']}, description=None, batch_size=None, columns=None, features=None, filters=None) doesn't have a 'on_bad_files' key.
Traceback (most recent call last):
File "/mnt/scratch/plays/nanoVLM/train.py", line 702, in <module>
main()
File "/mnt/scratch/plays/nanoVLM/train.py", line 696, in main
train(train_cfg, vlm_cfg)
File "/mnt/scratch/plays/nanoVLM/train.py", line 265, in train
train_loader, val_loader, iter_train_loader, iter_val_loader = get_dataloaders(train_cfg, vlm_cfg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/scratch/plays/nanoVLM/train.py", line 155, in get_dataloaders
raise ValueError("No valid datasets were loaded. Please check your dataset path and configurations.")
ValueError: No valid datasets were loaded. Please check your dataset path and configurations.
Line 75 in 4e0c096
| train_dataset_path: str = 'HuggingFaceM4/FineVision_concat_shuffled_2' |
Metadata
Metadata
Assignees
Labels
No labels