@@ -285,7 +285,7 @@ <h2 class="title is-4">Performance of different MLLMs on open-ended VQA of OmniB
285285 < h2 class ="title is-4 "> Diverse Modality Evaluation.</ h2 >
286286 < div class ="content has-text-centered ">
287287 < img src ="static/images/OmniBrainBench_analysis_DiffModality.png " alt ="algebraic reasoning " width ="75% " class ="center ">
288- < p > Table 3 : Diverse Modality Evaluation.</ p >
288+ < p > Figure 1 : Diverse Modality Evaluation.</ p >
289289 </ div >
290290 </ div >
291291 </ div >
@@ -296,7 +296,7 @@ <h2 class="title is-4">Diverse Modality Evaluation.</h2>
296296 < h2 class ="title is-4 "> Performance of models on different numbers of images.</ h2 >
297297 < div class ="content has-text-centered ">
298298 < img src ="static/images/OmniBrainBench_analysis_DiffImages.png " alt ="algebraic reasoning " width ="80% " class ="center ">
299- < p > Figure 1 : Performance of models on different numbers of images.</ p >
299+ < p > Figure 2 : Performance of models on different numbers of images.</ p >
300300 </ div >
301301 </ div >
302302 </ div >
@@ -420,92 +420,86 @@ <h2 class="title is-3">Case Study</h2>
420420 These case studies emphasize the need for improved medical knowledge integration and enhanced perceptual capabilities to bridge the gap between current MLLM performance and clinical requirements.
421421 </p> -->
422422 </ div >
423- <!-- <div class="columns is-multiline">
424- <div class="column is-half">
425- <figure class="image">
426- <img src="static/images/Case Study_01.jpg" alt="Case Study 01" class="center">
427- <figcaption>Figure 7: Correct sample from Gemini-2.5-Pro.</figcaption>
428- </figure>
429- </div>
430- <div class="column is-half">
431- <figure class="image">
432- <img src="static/images/Case Study_02.jpg" alt="Case Study 02" class="center">
433- <figcaption>Figure 8: Correct sample from Gemini-2.5-Pro.</figcaption>
434- </figure>
435- </div>
436- <div class="column is-half">
437- <figure class="image">
438- <img src="static/images/Case Study_03.jpg" alt="Case Study 03" class="center">
439- <figcaption>Figure 9: Correct sample from Gemini-2.5-Pro.</figcaption>
440- </figure>
441- </div>
442- <div class="column is-half">
443- <figure class="image">
444- <img src="static/images/Case Study_04.jpg" alt="Case Study 04" class="center">
445- <figcaption>Figure 10: Correct sample from GPT-4o.</figcaption>
446- </figure>
447- </div>
448- <div class="column is-half">
449- <figure class="image">
450- <img src="static/images/Case Study_05.jpg" alt="Case Study 05" class="center">
451- <figcaption>Figure 11: Correct sample from GPT-4o.</figcaption>
452- </figure>
453- </div>
454- <div class="column is-half">
455- <figure class="image">
456- <img src="static/images/Case Study_06.jpg" alt="Case Study 06" class="center">
457- <figcaption>Figure 12: Correct sample from GPT-4o.</figcaption>
458- </figure>
459- </div>
460- <div class="column is-half">
461- <figure class="image">
462- <img src="static/images/Case Study_07.jpg" alt="Case Study 07" class="center">
463- <figcaption>Figure 13: Error sample demonstrating Perceptual Errors (QvQ-72B).</figcaption>
464- </figure>
465- </div>
466- <div class="column is-half">
467- <figure class="image">
468- <img src="static/images/Case Study_08.jpg" alt="Case Study 08" class="center">
469- <figcaption>Figure 14: Error sample demonstrating Perceptual Errors (HuatuoGPT-Vision-34B).</figcaption>
470- </figure>
471- </div>
472- <div class="column is-half">
473- <figure class="image">
474- <img src="static/images/Case Study_09.jpg" alt="Case Study 09" class="center">
475- <figcaption>Figure 15: Error sample demonstrating Lack of Knowledge (QvQ-72B).</figcaption>
476- </figure>
477- </div>
478- <div class="column is-half">
479- <figure class="image">
480- <img src="static/images/Case Study_10.jpg" alt="Case Study 10" class="center">
481- <figcaption>Figure 16: Error sample demonstrating Lack of Knowledge (HuatuoGPT-Vision-34B).</figcaption>
482- </figure>
483- </div>
484- <div class="column is-half">
485- <figure class="image">
486- <img src="static/images/Case Study_11.jpg" alt="Case Study 11" class="center">
487- <figcaption>Figure 17: Error sample demonstrating Irrelevant Response (LLaVA-Med).</figcaption>
488- </figure>
489- </div>
490- <div class="column is-half">
491- <figure class="image">
492- <img src="static/images/Case Study_12.jpg" alt="Case Study 12" class="center">
493- <figcaption>Figure 18: Error sample demonstrating Irrelevant Response (ColonGPT).</figcaption>
494- </figure>
495- </div>
496- <div class="column is-half">
497- <figure class="image">
498- <img src="static/images/Case Study_13.jpg" alt="Case Study 13" class="center">
499- <figcaption>Figure 19: Error sample demonstrating Refusal to Answer (GPT-4o).</figcaption>
500- </figure>
501- </div>
502- <div class="column is-half">
503- <figure class="image">
504- <img src="static/images/Case Study_14.jpg" alt="Case Study 14" class="center">
505- <figcaption>Figure 20: Error sample demonstrating Refusal to Answer (Grok-3).</figcaption>
506- </figure>
507- </div>
508- </div> -->
423+ < div class ="columns is-multiline ">
424+ <!-- GPT-5 -->
425+ < div class ="column is-half ">
426+ < figure class ="image ">
427+ < img src ="static/images/GPT-5 closed-ended VQA.png " alt ="GPT-5 Closed-ended VQA Samples " class ="center ">
428+ < figcaption > Figure 3: Correct/Error samples in GPT-5 closed-ended VQA.</ figcaption >
429+ </ figure >
430+ </ div >
431+ < div class ="column is-half ">
432+ < figure class ="image ">
433+ < img src ="static/images/GPT-5 open-ended VQA.png " alt ="GPT-5 Open-ended VQA Samples " class ="center ">
434+ < figcaption > Figure 4: Correct/Error samples in GPT-5 open-ended VQA.</ figcaption >
435+ </ figure >
436+ </ div >
437+ <!-- Claude-4.5-Sonnet -->
438+ < div class ="column is-half ">
439+ < figure class ="image ">
440+ < img src ="static/images/Claude-4.5-Sonnet closed-ended VQA.png " alt ="Claude-4.5-Sonnet Closed-ended VQA Samples " class ="center ">
441+ < figcaption > Figure 5: Correct/Error samples in Claude-4.5-Sonnet closed-ended VQA.</ figcaption >
442+ </ figure >
443+ </ div >
444+ < div class ="column is-half ">
445+ < figure class ="image ">
446+ < img src ="static/images/Claude-4.5-Sonnet open-ended VQA.png " alt ="Claude-4.5-Sonnet Open-ended VQA Samples " class ="center ">
447+ < figcaption > Figure 6: Correct/Error samples in Claude-4.5-Sonnet open-ended VQA.</ figcaption >
448+ </ figure >
449+ </ div >
450+ <!-- Gemini-2.5-Pro -->
451+ < div class ="column is-half ">
452+ < figure class ="image ">
453+ < img src ="static/images/Gemini-2.5-Pro closed-ended VQA.png " alt ="Gemini-2.5-Pro Closed-ended VQA Samples " class ="center ">
454+ < figcaption > Figure 7: Correct/Error samples in Gemini-2.5-Pro closed-ended VQA.</ figcaption >
455+ </ figure >
456+ </ div >
457+ < div class ="column is-half ">
458+ < figure class ="image ">
459+ < img src ="static/images/Gemini-2.5-Pro open-ended VQA.png " alt ="Gemini-2.5-Pro Open-ended VQA Samples " class ="center ">
460+ < figcaption > Figure 8: Correct/Error samples in Gemini-2.5-Pro open-ended VQA.</ figcaption >
461+ </ figure >
462+ </ div >
463+ <!-- Deepseek-V3.1 -->
464+ < div class ="column is-half ">
465+ < figure class ="image ">
466+ < img src ="static/images/Deepseek-V3.1 closed-ended VQA.png " alt ="Deepseek-V3.1 Closed-ended VQA Samples " class ="center ">
467+ < figcaption > Figure 9: Correct/Error samples in Deepseek-V3.1 closed-ended VQA.</ figcaption >
468+ </ figure >
469+ </ div >
470+ < div class ="column is-half ">
471+ < figure class ="image ">
472+ < img src ="static/images/Deepseek-V3.1 open-ended VQA.png " alt ="Deepseek-V3.1 Open-ended VQA Samples " class ="center ">
473+ < figcaption > Figure 10: Correct/Error samples in Deepseek-V3.1 open-ended VQA.</ figcaption >
474+ </ figure >
475+ </ div >
476+ <!-- Qwen3-VL-30B -->
477+ < div class ="column is-half ">
478+ < figure class ="image ">
479+ < img src ="static/images/Qwen3-VL-30B closed-ended VQA.png " alt ="Qwen3-VL-30B Closed-ended VQA Samples " class ="center ">
480+ < figcaption > Figure 11: Correct/Error samples in Qwen3-VL-30B closed-ended VQA.</ figcaption >
481+ </ figure >
482+ </ div >
483+ < div class ="column is-half ">
484+ < figure class ="image ">
485+ < img src ="static/images/Qwen3-VL-30B open-ended VQA.png " alt ="Qwen3-VL-30B Open-ended VQA Samples " class ="center ">
486+ < figcaption > Figure 12: Correct/Error samples in Qwen3-VL-30B open-ended VQA.</ figcaption >
487+ </ figure >
488+ </ div >
489+ <!-- Lingshu-32B -->
490+ < div class ="column is-half ">
491+ < figure class ="image ">
492+ < img src ="static/images/Lingshu-32B closed-ended VQA.png " alt ="Lingshu-32B Closed-ended VQA Samples " class ="center ">
493+ < figcaption > Figure 13: Correct/Error samples in Lingshu-32B closed-ended VQA.</ figcaption >
494+ </ figure >
495+ </ div >
496+ < div class ="column is-half ">
497+ < figure class ="image ">
498+ < img src ="static/images/Lingshu-32B open-ended VQA.png " alt ="Lingshu-32B Open-ended VQA Samples " class ="center ">
499+ < figcaption > Figure 14: Correct/Error samples in Lingshu-32B open-ended VQA.</ figcaption >
500+ </ figure >
501+ </ div >
502+ </ div >
509503 </ div >
510504 </ div >
511505 </ div >
0 commit comments