File tree Expand file tree Collapse file tree
lmms_eval/tasks/countbench Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -47,6 +47,7 @@ python -m lmms_eval --tasks list_with_num
4747 - COCO 2017 Caption MiniVal (coco2017_cap_val)
4848 - COCO 2017 Caption MiniTest (coco2017_cap_test)
4949- [ ConBench] ( https://github.com/foundation-multimodal-models/ConBench ) (conbench)
50+ - [ CountBench] ( https://huggingface.co/datasets/vikhyatk/CountBenchQA ) (countbench)
5051- [ CV-Bench] ( https://github.com/nyu-visionx/CV-Bench ) (cv_bench)
5152- [ DetailCaps-4870] ( https://github.com/foundation-multimodal-models/CAPTURE ) (detailcaps)
5253- [ Flickr30K] ( https://github.com/BryanPlummer/flickr30k_entities ) (flickr30k)
Original file line number Diff line number Diff line change 1+ dataset_path : vikhyatk/CountBenchQA
2+ task : countbench
3+ test_split : test
4+ output_type : generate_until
5+ doc_to_visual : !function utils.countbench_doc_to_visual
6+ doc_to_text : !function utils.countbench_doc_to_text
7+ doc_to_target : !function utils.countbench_doc_to_target
8+ generation_kwargs :
9+ max_new_tokens : 16
10+ temperature : 0
11+ do_sample : false
12+ process_results : !function utils.countbench_process_results
13+ metric_list :
14+ - metric : acc
15+ aggregation : mean
16+ higher_is_better : true
17+ lmms_eval_specific_kwargs :
18+ default :
19+ pre_prompt : " Look at the image carefully and count the objects. Answer with just a number, without any additional text. "
20+ post_prompt : " "
21+ metadata :
22+ - version : 0.0
Original file line number Diff line number Diff line change 1+ NUMBER_WORD_TO_NUMERAL = {
2+ "none" : "0" ,
3+ "zero" : "0" ,
4+ "one" : "1" ,
5+ "two" : "2" ,
6+ "three" : "3" ,
7+ "four" : "4" ,
8+ "five" : "5" ,
9+ "six" : "6" ,
10+ "seven" : "7" ,
11+ "eight" : "8" ,
12+ "nine" : "9" ,
13+ "ten" : "10" ,
14+ "eleven" : "11" ,
15+ "twelve" : "12" ,
16+ "thirteen" : "13" ,
17+ "fourteen" : "14" ,
18+ "fifteen" : "15" ,
19+ "sixteen" : "16" ,
20+ "seventeen" : "17" ,
21+ "eighteen" : "18" ,
22+ "nineteen" : "19" ,
23+ "twenty" : "20" ,
24+ }
25+
26+
27+ def _normalize_count_answer (answer ) -> str :
28+ normalized = str (answer ).strip ().lower ()
29+ return NUMBER_WORD_TO_NUMERAL .get (normalized , normalized )
30+
31+
32+ def countbench_doc_to_visual (doc ):
33+ return [doc ["image" ].convert ("RGB" )]
34+
35+
36+ def countbench_doc_to_text (doc , lmms_eval_specific_kwargs = None ):
37+ kwargs = lmms_eval_specific_kwargs or {}
38+ pre_prompt = kwargs .get ("pre_prompt" , "" )
39+ post_prompt = kwargs .get ("post_prompt" , "" )
40+ question = doc ["question" ].strip ()
41+ return f"{ pre_prompt } { question } { post_prompt } "
42+
43+
44+ def countbench_doc_to_target (doc ):
45+ return _normalize_count_answer (doc ["number" ])
46+
47+
48+ def countbench_process_results (doc , results ):
49+ prediction = _normalize_count_answer (results [0 ])
50+ target = countbench_doc_to_target (doc )
51+ return {"acc" : float (prediction == target )}
You can’t perform that action at this time.
0 commit comments