Skip to content

Commit eb8887e

Browse files
Enhance qwen2-vl finetune and add python file/readme in xtune (#1862)
* fix transformers==4.51.0 to avoid qwen2-vl finetune issue Signed-off-by: jilongwa <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable optuna in qwen2-vl finetune Signed-off-by: jilongwa <[email protected]> * delete extra file Signed-off-by: jilongwa <[email protected]> * update readme Signed-off-by: jilongwa <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix type-o Signed-off-by: jilongwa <[email protected]> --------- Signed-off-by: jilongwa <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 64b7b04 commit eb8887e

File tree

8 files changed

+1408
-5
lines changed

8 files changed

+1408
-5
lines changed

comps/finetuning/src/integrations/xtune/README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,81 @@ cd src/llamafactory/adaclip_finetune
161161
# Please see README.md in src/llamafactory/adaclip_finetune for detail
162162
```
163163

164+
### Qwen2-VL Training and Hyperparameter Optimization
165+
166+
```bash
167+
# Please see Qwen2-VL_README.md in doc for detail, bolow are simple use
168+
```
169+
170+
#### Step 1: Finetune qwen2-vl with logging eval loss
171+
172+
If you want to finetune with plotting eval loss, please set eval_strategy as steps, eval_stepsand eval_dataset:
173+
174+
```
175+
# Finetune qwen2-vl with logging eval loss
176+
export DATA='where you can find dataset_info.json'
177+
export dataset=activitynet_qa_2000_limit_20s # to point which dataset llamafactory will use
178+
export eval_dataset=activitynet_qa_val_500_limit_20s
179+
llamafactory-cli train \
180+
--stage sft \
181+
--do_train True \
182+
--model_name_or_path $models/Qwen2-VL-7B-Instruct-GPTQ-Int8 \
183+
--preprocessing_num_workers 16 \
184+
--finetuning_type lora \
185+
--template qwen2_vl \
186+
--flash_attn auto \
187+
--dataset_dir $DATA \
188+
--dataset $dataset \
189+
--cutoff_len 2048 \
190+
--learning_rate 5e-05 \
191+
--num_train_epochs 20.0 \
192+
--max_samples 100000 \
193+
--per_device_train_batch_size 2 \
194+
--gradient_accumulation_steps 8 \
195+
--lr_scheduler_type cosine \
196+
--max_grad_norm 1.0 \
197+
--logging_steps 10 \
198+
--save_steps 100 \
199+
--warmup_steps 100 \
200+
--packing False \
201+
--report_to none \
202+
--output_dir saves/Qwen2-VL-7B-Instruct-GPTQ-Int8/lora/finetune_test_valmetrics_evalstep8 \
203+
--bf16 True \
204+
--plot_loss True \
205+
--ddp_timeout 180000000 \
206+
--optim adamw_torch \
207+
--video_fps 0.1 \
208+
--per_device_eval_batch_size 1 \
209+
--eval_strategy steps \
210+
--eval_steps 100 \
211+
--eval_dataset ${eval_dataset} \
212+
--predict_with_generate true \
213+
--lora_rank 8 \
214+
--lora_alpha 16 \
215+
--lora_dropout 0 \
216+
--lora_target all
217+
```
218+
219+
#### step 2: Evaluation metrics calculation and plotting
220+
221+
If you want to plot eval metrics:
222+
Change `MODEL_NAME`,`EXPERIENT_NAME`,`EVAL_DATASET` as you need and run evaluation metrics calculation sctrpt:
223+
224+
```
225+
export MODEL_DIR = where can find eval model
226+
export MODEL_NAME="Qwen2-VL-2B-Instruct"
227+
export EXPERIENT_NAME="finetune_onlyplot_evalloss_5e-6"
228+
export EVAL_DATASET=activitynet_qa_val_500_limit_20s
229+
chmod a+x ./doc/run_eval.sh
230+
./doc/run_eval.sh
231+
```
232+
233+
Change `model_name` and `experiment_name` then run:
234+
235+
```
236+
python plot_metrics.py --model_name your_model_name --experiment_name your_experiment_name
237+
```
238+
164239
### DeepSeek-R1 Distillation(not main function)
165240

166241
Please see [doc](./doc/DeepSeek-R1_distillation_best_practice-v1.3.pdf) for details
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
25.05-dev
1+
25.07-dev

comps/finetuning/src/integrations/xtune/doc/Prepare_dataset.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,154 @@ wget https://cs.stanford.edu/people/ranjaykrishna/densevid/captions.zip
133133
```
134134

135135
- DiDeMo annotations have two components: annotations from the [original author](https://github.com/LisaAnne/LocalizingMoments/tree/master/data) and the split used by [Collaborative Experts](https://github.com/albanie/collaborative-experts/tree/master/misc/datasets/didemo).
136+
137+
## Dataset for Qwen2-VL Finetune
138+
139+
### ActivityNet-QA
140+
141+
Please follow https://github.com/MILVLG/activitynet-qa/tree/master to download and seperata train/val dataset
142+
143+
Then use below python generate_llama_json_limit_frames.py file to generate our train and test dataset:
144+
python generate_llama_json_limit_frames.py -name val_q -type val -n 500 -seconds 20
145+
146+
generate_llama_json_limit_frames.py
147+
148+
```python
149+
import json
150+
import os
151+
import argparse
152+
import ffmpeg
153+
154+
# Define the path to the directory where the video files are stored
155+
video_directory = "where to find dataset"
156+
157+
158+
def get_video_duration(video_path):
159+
try:
160+
probe = ffmpeg.probe(video_path)
161+
video_stream = next(stream for stream in probe["streams"] if stream["codec_type"] == "video")
162+
return float(video_stream["duration"])
163+
except Exception as e:
164+
print(f"Error getting duration for video {video_path}: {e}")
165+
return 0
166+
167+
168+
if __name__ == "__main__":
169+
# Parse command line arguments
170+
parser = argparse.ArgumentParser(description="Generate LLaMA JSON")
171+
parser.add_argument("-name", type=str, default="train_q_3000", help="Number of questions to process")
172+
parser.add_argument("-type", type=str, default="train", help="data type")
173+
parser.add_argument("-fps", type=float, default=0.2, help="data type")
174+
parser.add_argument("-n", type=int, default=250, help="data type")
175+
parser.add_argument("-seconds", type=int, default=20, help="minimum video duration in seconds")
176+
args = parser.parse_args()
177+
fps = args.fps
178+
basic_seconds = args.seconds
179+
question_json = "../activitynet-qa/dataset/{}.json".format(args.name)
180+
answer_json = "../activitynet-qa/dataset/{}_a.json".format(args.type)
181+
combine_json = "../data/activitynet_qa_{}_{}_limit_{}s.json".format(args.type, args.n, basic_seconds)
182+
print("combine_json:", combine_json)
183+
184+
# Supported video file extensions
185+
video_extensions = (".mp4", ".mkv", "webm")
186+
187+
# Load the questions and answers JSON files
188+
with open(question_json, "r") as question_file:
189+
questions = json.load(question_file)
190+
191+
with open(answer_json, "r") as answer_file:
192+
answers = json.load(answer_file)
193+
194+
# Create a dictionary to map question_id to answer for quick lookup
195+
answer_lookup = {answer["question_id"]: answer for answer in answers}
196+
197+
combined_data = []
198+
len_pairs = len(questions)
199+
# Process each question and look for a corresponding answer
200+
for question in questions:
201+
question_id = question["question_id"]
202+
if question_id in answer_lookup:
203+
answer = answer_lookup[question_id]
204+
205+
# Extract the video name typically between 'v_' and the second underscore or end
206+
video_name_without_path = ("_").join(question_id.split("_")[:-1])
207+
# Search for the video file that matches the extracted name
208+
video_path = None
209+
find_flag = False
210+
# Walk through the directory to find matching video files
211+
for root, dirs, files in os.walk(video_directory):
212+
for file in files:
213+
if file.startswith(video_name_without_path) and file.endswith(video_extensions):
214+
video_path = os.path.join(root, file)
215+
find_flag = True
216+
break
217+
if video_path:
218+
break
219+
if not find_flag:
220+
print("!!not find:", video_name_without_path)
221+
if video_path:
222+
video_duration = get_video_duration(video_path)
223+
if video_duration > basic_seconds:
224+
combined_entry = {
225+
"messages": [
226+
{"content": f"<video>{question['question']}?", "role": "user"},
227+
{"content": answer["answer"], "role": "assistant"},
228+
],
229+
"videos": [video_path],
230+
}
231+
combined_data.append(combined_entry)
232+
if len(combined_data) % 100 == 0:
233+
print(f"Processed {len(combined_data)} entries")
234+
if len(combined_data) >= args.n:
235+
break
236+
else:
237+
print("video_duration < basic_seconds", video_duration, video_path)
238+
# Write the combined data to the output JSON file
239+
with open(combine_json, "w") as combine_file:
240+
json.dump(combined_data, combine_file, indent=4)
241+
```
242+
243+
## Update dataset_info.json
244+
245+
### dataset_info.json
246+
247+
```json
248+
{
249+
"caltech101": {
250+
"file_name": "caltech101.json"
251+
},
252+
"ActivityNet": {
253+
"file_name": "ActivityNet.json"
254+
},
255+
"activitynet_qa_2000_limit_20s": {
256+
"file_name": "activitynet_qa_2000_limit_20s.json",
257+
"formatting": "sharegpt",
258+
"columns": {
259+
"messages": "messages",
260+
"videos": "videos"
261+
},
262+
"tags": {
263+
"role_tag": "role",
264+
"content_tag": "content",
265+
"user_tag": "user",
266+
"assistant_tag": "assistant"
267+
}
268+
}
269+
}
270+
```
271+
272+
### caltech101.json
273+
274+
```json
275+
[]
276+
```
277+
278+
### ActivityNet.json
279+
280+
```json
281+
[]
282+
```
283+
284+
### activitynet_qa_2000_limit_20s.json
285+
286+
Generate by generate_llama_json_limit_frames.py

0 commit comments

Comments
 (0)