Dataset image path incorrectly loaded 多模态数据集图像路径错误 #7043
Unanswered
SovietLongbow
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
my dataset contains records like this:
{ "messages": [ { "role": "user", "content": "解释这张图片<image>" }, { "role": "assistant", "content": "一个坐在推车上的女孩旁边有两个双手插着口袋的小孩走在室外的道路上" } ], "images": [ "/root/autodl-tmp/image_set/3f040261c9543402c4804c520fbd29bcba5137cf.jpg" ] },
image path was under autodl-tmp folder
however when I start to evaluate using command
llamafactory-cli train \ --stage sft \ --model_name_or_path /root/autodl-tmp/swift/llava-1.5-7b-hf \ --preprocessing_num_workers 16 \ --finetuning_type lora \ --quantization_method bitsandbytes \ --template llava \ --flash_attn auto \ --dataset_dir /root/LLaMA-Factory/data \ --eval_dataset testSet00 \ --cutoff_len 1024 \ --max_samples 100000 \ --per_device_eval_batch_size 2 \ --predict_with_generate True \ --max_new_tokens 512 \ --top_p 0.7 \ --temperature 0.95 \ --output_dir saves/LLaVA-1.5-7B-Chat/lora/eval_2025-02-24-09-11-19 \ --do_predict True
it had a exception of
FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl_tmp/image_set/3f040261c9543402c4804c520fbd29bcba5137cf.jpg'
the program turn a ‘-’ into a‘_' for a reason that i do not understand, causing an error
how can i solve this problem?
ps. dataset_info.json:
{ "trainSet00":{ "file_name": "/root/LLaMA-Factory/data/202502/trainSet00.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, "testSet00":{ "file_name": "/root/LLaMA-Factory/data/202502/testSet00.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } } }
Beta Was this translation helpful? Give feedback.
All reactions