-
Notifications
You must be signed in to change notification settings - Fork 507
Description
Required prerequisites
- I have read the documentation https://align-anything.readthedocs.io.
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
What version of align-anything are you using?
0.0.1.dev0
System information
官方安装流程,单节点8卡a800训练,128个训练样本,首先使用pre tokenizer转换为pt文件,把pt文件路径放到sft的训练脚本中,启动训练
MODEL_NAME_OR_PATH="/mnt/data_vlm/models/Janus-Pro-1B"
TRAIN_DATASETS="/mnt/data_vlm/liang.hu/janus_train/sft_imagegen_data/"
TRAIN_DATA_FILE="train.pt"
OUTPUT_DIR="/mnt/data_vlm/liang.hu/janus_train/output/"
JANUS_REPO_PATH="/mnt/data_vlm/liang.hu/Janus"
export PYTHONPATH=$PYTHONPATH:$JANUS_REPO_PATH
export WANDB_API_KEY="xxx"
export WANDB_PROJECT="Janus"
export WANDB_NAME="xxx"
export WANDB_MODE=online
Source the setup script
source ./setup.sh
Execute deepspeed command
deepspeed
--master_port ${MASTER_PORT}
--module align_anything.trainers.janus.sft
--model_name_or_path ${MODEL_NAME_OR_PATH}
--train_datasets ${TRAIN_DATASETS}
--train_data_files ${TRAIN_DATA_FILE}
--train_split train
--learning_rate 2e-5
--epochs 1
--weight_decay 0.1
--adam_beta1 0.9
--adam_beta2 0.95
--lr_scheduler_type constant
--output_dir ${OUTPUT_DIR}
Problem description
报错
[rank1]: Traceback (most recent call last):
[rank1]: File "", line 198, in _run_module_as_main
[rank1]: File "", line 88, in _run_code
[rank1]: File "/mnt/data_vlm/liang.hu/align-anything/align_anything/trainers/janus/sft.py", line 118, in
[rank1]: sys.exit(main())
[rank1]: ^^^^^^
[rank1]: File "/mnt/data_vlm/liang.hu/align-anything/align_anything/trainers/janus/sft.py", line 113, in main
[rank1]: trainer.train()
[rank1]: File "/mnt/data_vlm/liang.hu/align-anything/align_anything/trainers/text_to_text/sft.py", line 143, in train
[rank1]: info = self.train_step(batch)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/data_vlm/liang.hu/align-anything/align_anything/trainers/text_to_text/sft.py", line 102, in train_step
[rank1]: loss = self.loss(sft_batch)['loss']
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/mnt/data_vlm/liang.hu/align-anything/align_anything/trainers/janus/sft.py", line 82, in loss
[rank1]: outputs = self.model.forward(**sft_batch, task=sft_batch['task'])
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: TypeError: deepspeed.utils.nvtx.instrument_w_nvtx..wrapped_fn() got multiple values for keyword argument 'task'
Reproducible example code
The Python snippets:
Command lines:
Extra dependencies:
Steps to reproduce:
Traceback
Expected behavior
No response
Additional context
No response