Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
Key Issues Identified
Repo ID Format: The error message states that the repository ID must be in the form repo_name or namespace/repo_name. The path you provided (saves/Qwen2.5-14B/full/train_2025-01-15-16-40) does not conform to this format. You may need to adjust how you specify the model or adapter path.
Missing Configuration File: The error Can't find 'adapter_config.json' at 'saves/Qwen2.5-14B/full/train_2025-01-15-16-40' indicates that the expected configuration file is not found in the specified directory. Ensure that this file exists at the specified path.
Command Line vs. GUI: You mentioned that the fine-tuning works from the command line but not from the GUI (LLaMA Board). This discrepancy could be due to differences in how paths or configurations are handled in the two environments.
Suggested Steps to Resolve the Issues
Check the Path: Verify that the path you are using for adapter_name_or_path is correct and that it contains the necessary configuration files, including adapter_config.json.
Correct Repo ID: If you are using a model from Hugging Face or a similar repository, ensure that you are using the correct format for the repository ID. For example, if your model is hosted on Hugging Face, it should look like namespace/repo_name.
File Existence: Check if the adapter_config.json file exists in the specified directory. If it does not, you may need to create it or ensure that it is generated during the training process.
Run in Command Line: Since you mentioned that the command line works, you might want to continue using that method until the GUI issues are resolved. You can also compare the configurations used in both environments to identify any discrepancies.
Update Dependencies: Ensure that all your dependencies are up to date. Sometimes, issues arise from version mismatches. You can update your packages using:
bash
Verify
Open In Editor
Run
Copy code
pip install --upgrade <package_name>
Consult Documentation: Check the documentation for LLaMA-Factory and any related libraries (like PEFT and Hugging Face) for any specific requirements or changes in how models and adapters should be specified.
Debugging: If the issue persists, consider adding logging or print statements in the relevant parts of the code to better understand where the failure occurs.
Example Command Line Usage
If you want to run the command line as you mentioned, ensure that you are in the correct directory and use the command as follows:
bash
Verify
Open In Editor
Run
Copy code
llamafactory-cli train
--stage sft
--do_train True
--model_name_or_path /media/data/llm/Qwen2.5-14B-Instruct
--preprocessing_num_workers 16
--finetuning_type lora
--template default
--flash_attn auto
--dataset_dir data
--dataset identity
--cutoff_len 2048
--learning_rate 5e-05
--num_train_epochs 3.0
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 5
--save_steps 100
--warmup_steps 0
--packing False
--report_to none
--output_dir saves/Qwen2.5-14B/lora/train_2025-01-15-16-40-16
--bf16 True
--plot_loss True
--trust_remote_code True
--ddp_timeout 180000000
--optim adamw_torch
--adapter_name_or_path saves/Qwen2.5-14B/full/train_2025-01-15-16-40
--lora_rank 8
--lora_alpha 16
--lora_dropout 0
--pissa_init True
--pissa_convert True
--lora_target all
--deepspeed cache/ds_z2_config.json
Reproduction
Put your message here.
Others
No response