Skip to content

Commit fcb7b29

Browse files
Update index.html
1 parent 0f9e483 commit fcb7b29

1 file changed

Lines changed: 32 additions & 32 deletions

File tree

index.html

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -56,29 +56,29 @@ <h1 class="title is-1 publication-title">
5656
<span class="mmmu" style="vertical-align: middle">OmniBrainBench</span>
5757
</h1>
5858
<h2 class="subtitle is-3 publication-subtitle">
59-
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
59+
OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks
6060
<!-- <br> -->
6161
</h2>
6262
</h1>
6363
<div class="is-size-5 publication-authors">
64+
<span class="author-block">Zhihao Peng*<sup style="color:#ffac33;">1</sup>,</span><br>
65+
<span class="author-block">Cheng Wang*<sup style="color:#ffac33;">,1</sup>,</span>
6466
<span class="author-block">Shengyuan Liu*<sup style="color:#ffac33;">1</sup>,</span>
65-
<span class="author-block">Boyun Zheng*<sup style="color:#ffac33;">,1</sup>,</span>
66-
<span class="author-block">Wenting Chen*<sup style="color:#6fbf73;">2</sup>,</span>
67-
<span class="author-block">Zhihao Peng<sup style="color:#ffac33;">1</sup>,</span><br>
68-
<span class="author-block">Zhenfei Yin<sup style="color:#ff00f2;">3</sup>,</span>
69-
<span class="author-block">Jing Shao<sup style="color:#9b51e0;">4</sup>,</span>
70-
<span class="author-block">Jiancong Hu<sup style="color:#ed4b82;">5</sup>,</span>
67+
<span class="author-block">Zhiying Liang<sup style="color:#ff00f2;">2</sup>,</span>
68+
<span class="author-block">Zanting Ye<sup style="color:#9b51e0;">3</sup>,</span>
69+
<span class="author-block">Min Jie Ju<sup style="color:#ed4b82;">4</sup>,</span>
70+
<span class="author-block">Peter YM Woo<sup style="color:#ed4b82;">5</sup>,</span>
7171
<span class="author-block">Yixuan Yuan<sup style="color:#ffac33;">†,1</sup>,</span>
7272
</div>
7373

7474
<br>
7575

7676
<div class="is-size-5 publication-authors">
77-
<span class="author-block"><sup style="color:#ffac33;">1</sup>The Chinese University of Hong Kong,</span>
78-
<span class="author-block"><sup style="color:#6fbf73;">2</sup>City University of Hong Kong</span>
79-
<span class="author-block"><sup style="color:#ff00f2;">3</sup>University of Oxford</span>
80-
<span class="author-block"><sup style="color:#9b51e0;">4</sup>Shanghai AI Laboratory,</span>
81-
<span class="author-block"><sup style="color:#ed4b82;">5</sup>The Sixth Affiliated Hospital, Sun Yat-sen University</span></br>
77+
<span class="author-block"><sup style="color:#ffac33;">1</sup>Department of Electronic Engineering, The Chinese University of Hong Kong</span>
78+
<span class="author-block"><sup style="color:#6fbf73;">2</sup>Sun Yat-sen Memorial Hospital, Sun Yat-sen University</span>
79+
<span class="author-block"><sup style="color:#ff00f2;">3</sup>School of Biomedical Engineering, Southern Medical University</span>
80+
<span class="author-block"><sup style="color:#9b51e0;">4</sup>Zhongshan Hospital, Fudan University</span>
81+
<span class="author-block"><sup style="color:#ed4b82;">5</sup>Department of Neurosurgery, Prince of Wales Hospital</span></br>
8282
</div>
8383

8484
<br>
@@ -113,7 +113,7 @@ <h2 class="subtitle is-3 publication-subtitle">
113113

114114
<!-- HF abstract Link -->
115115
<span class="link-block">
116-
<a href="https://huggingface.co/datasets/Saint-lsy/EndoBench" target="_blank"
116+
<a href="https://huggingface.co/datasets/FrankPN/OmniBrainBench" target="_blank"
117117
class="external-link button is-normal is-rounded is-dark">
118118
<span>🤗Hugging Face</span>
119119
</a>
@@ -130,7 +130,7 @@ <h2 class="subtitle is-3 publication-subtitle">
130130
</section>
131131

132132
<!-- News Section -->
133-
<section class="section">
133+
<!-- <section class="section">
134134
<div class="container">
135135
<div class="columns is-centered has-text-centered">
136136
<div class="column is-four-fifths">
@@ -141,7 +141,7 @@ <h2 class="title is-3">News</h2>
141141
</div>
142142
</div>
143143
</div>
144-
</section>
144+
</section> -->
145145

146146
<section class="section">
147147
<div class="container" style="margin-bottom: 2vh;">
@@ -151,7 +151,7 @@ <h2 class="title is-3">News</h2>
151151
<h2 class="title is-3">Highlight</h2>
152152
<div class="content has-text-justified">
153153
<p>
154-
1. We introduce OmniBrainBench, the first comprehensive benchmark specifically designed to evaluate MLLMs across the complete spectrum of endoscopy, covering 4 endoscopic scenarios, 12 specialized tasks with 12 secondary subtasks, and 5 levels of visual prompting granularities.
154+
1. We introduce OmniBrainBench, the first comprehensive multimodal benchmark specifically designed to evaluate MLLMs across the complete spectrum of brain imaging analysis with closed- and open-ended evaluations, covering {\textbf{9,527} clinically verified VQA pairs, \textbf{31,706} images, and \textbf{15} modalities}.
155155
</p>
156156
<p>
157157
2. We develop the multi-dimensional evaluation framework that mirrors the clinical workflow progression from basic anatomical recognition to advanced surgical intervention, assessing MLLMs' capabilities across the full spectrum of endoscopic analysis skills.
@@ -174,7 +174,7 @@ <h2 class="title is-3">Highlight</h2>
174174
<div class="column is-four-fifths">
175175
<h2 class="title is-3">Abstract</h2>
176176
<div class="content has-text-justified">
177-
<img src="static/images/dataset.jpg" alt="OmniBrainBench dataset" class="center">
177+
<img src="static/images/OmniBrainBench.png" alt="OmniBrainBench dataset" class="center">
178178
<p>
179179
Endoscopic procedures are essential for diagnosing and treating internal diseases, and multi-modal large language models (MLLMs) are increasingly applied to assist in endoscopy analysis. However, current benchmarks are limited, as they typically cover specific endoscopic scenarios and a small set of clinical tasks, failing to capture the real-world diversity of endoscopic modalities and the full range of skills needed in clinical workflows. To address these issues, we introduce OmniBrainBench, the first comprehensive benchmark specifically designed to assess MLLMs across the full spectrum of endoscopic practice with multi-dimensional capacities. OmniBrainBenchencompasses 4 distinct endoscopic modalities, 12 specialized clinical tasks with 12 secondary subtasks, and 5 levels of visual prompting granularities, resulting in 6,832 rigorously validated VQA pairs from 21 diverse datasets. Our multi-dimensional evaluation framework mirrors the clinical workflow—spanning anatomical recognition, lesion analysis, spatial localization, and surgical operations—to holistically gauge the perceptual and diagnostic abilities of MLLMs in realistic scenarios. We benchmark 23 state-of-the-art models, including general-purpose, medical-specialized, and proprietary MLLMs, and establish human clinician performance as a reference standard. Our extensive experiments reveal: (1) proprietary MLLMs outperform open-source and medical-specialized models overall, but still trail human experts; (2) medical-domain supervised fine-tuning substantially boosts task-specific accuracy; and (3) model performance remains sensitive to prompt format and clinical task complexity. OmniBrainBench establishes a new standard for evaluating and advancing MLLMs in endoscopy, highlighting both progress and persistent gaps between current models and expert clinical reasoning.
180180
</p>
@@ -235,7 +235,7 @@ <h2 class="title is-3">Construction Process</h2>
235235
this lexical tree, thereby achieving targeted improvements in model performance.
236236
</p> -->
237237
<div class="content has-text-centered">
238-
<img src="static/images/constrcut pipeline.png" alt="algebraic reasoning" width="100%"/ class="center">
238+
<img src="static/images/OmniBrainBench_construction.png" alt="algebraic reasoning" width="100%"/ class="center">
239239
<p> Data construction process of OmniBrainBench, consisting of (a) data collection, (b) QA standardization, and (c) data filtering. Finally, we implement (d) model evaluation on OmniBrainBench. </p>
240240
</div>
241241
</div>
@@ -262,8 +262,8 @@ <h1 class="title is-3 mmmu">Experiment Results</h1>
262262
<div class="column is-four-fifths">
263263
<h2 class="title is-4">Results of different MLLMs on 12 clinical tasks.</h2>
264264
<div class="content has-text-centered">
265-
<img src="static/images/table1.png" alt="algebraic reasoning" width="80%" class="center">
266-
<p>Table 1: Results of different MLLMs on 12 clinical tasks in OmniBrainBench. The best-performing model in each category is in-bold, and the second best is underlined.</p>
265+
<img src="static/images/OmniBrainBench_closedVQA.png" alt="algebraic reasoning" width="80%" class="center">
266+
<p>Table 1: Performance of different MLLMs on five specialized clinical phases with 15 secondary subtasks on closed-ended VQA of OmniBrainBench. The best-performing model in each category is highlighted in bold, and the second best is highlighted in underlined.</p>
267267
</div>
268268
</div>
269269
</div>
@@ -273,8 +273,8 @@ <h2 class="title is-4">Results of different MLLMs on 12 clinical tasks.</h2>
273273
<div class="column is-four-fifths">
274274
<h2 class="title is-4">Results of different MLLMs on 4 different endoscopy scenarios and 4 different visual prompts.</h2>
275275
<div class="content has-text-centered">
276-
<img src="static/images/table2.png" alt="algebraic reasoning" width="75%" class="center">
277-
<p>Table 2: Results of different MLLMs on 4 different endoscopy scenarios and 4 different visual prompts in OmniBrainBench. The best-performing model in each category is in-bold, and the second best is underlined.</p>
276+
<img src="static/images/OmniBrainBench_openVQA.png" alt="algebraic reasoning" width="75%" class="center">
277+
<p>Table 2: Performance of different MLLMs on open-ended VQA of OmniBrainBench. Higher values indicate better performance in generation quality, semantic similarity, and fluency.</p>
278278
</div>
279279
</div>
280280
</div>
@@ -284,8 +284,8 @@ <h2 class="title is-4">Results of different MLLMs on 4 different endoscopy scena
284284
<div class="column is-four-fifths">
285285
<h2 class="title is-4">Results of different MLLMs on 12 subtasks in OmniBrainBench.</h2>
286286
<div class="content has-text-centered">
287-
<img src="static/images/table3.png" alt="algebraic reasoning" width="75%" class="center">
288-
<p>Table 3: Results of different MLLMs on 12 subtasks in OmniBrainBench.</p>
287+
<img src="static/images/OmniBrainBench_analysis_DiffModality.png" alt="algebraic reasoning" width="75%" class="center">
288+
<p>Table 3: Diverse Modality Evaluation.</p>
289289
</div>
290290
</div>
291291
</div>
@@ -295,13 +295,13 @@ <h2 class="title is-4">Results of different MLLMs on 12 subtasks in OmniBrainBen
295295
<div class="column is-four-fifths">
296296
<h2 class="title is-4">Performance comparison of several leading MLLMs and Clinicians.</h2>
297297
<div class="content has-text-centered">
298-
<img src="static/images/figure2.jpg" alt="algebraic reasoning" width="80%" class="center">
299-
<p>Figure 1: Performance comparison of several leading MLLMs and Clinicians.</p>
298+
<img src="static/images/OmniBrainBench_analysis_DiffImages.png" alt="algebraic reasoning" width="80%" class="center">
299+
<p>Figure 1: Performance of models on different numbers of images.</p>
300300
</div>
301301
</div>
302302
</div>
303303

304-
<!-- 第四个图 -->
304+
<!-- <!-- 第四个图 -->
305305
<div class="columns is-centered has-text-centered">
306306
<div class="column is-four-fifths">
307307
<h2 class="title is-4">Performance comparison across four major categories.</h2>
@@ -310,9 +310,9 @@ <h2 class="title is-4">Performance comparison across four major categories.</h2>
310310
<p>Figure 2: Performance comparison across 4 major categories in OmniBrainBench among existing MLLMs.</p>
311311
</div>
312312
</div>
313-
</div>
313+
</div> -->
314314

315-
<!-- 第五个图 -->
315+
<!-- <!-- 第五个图 -->
316316
<div class="columns is-centered has-text-centered">
317317
<div class="column is-four-fifths">
318318
<h2 class="title is-4">Performance comparison across four endoscopic scenarios.</h2>
@@ -321,9 +321,9 @@ <h2 class="title is-4">Performance comparison across four endoscopic scenarios.<
321321
<p>Figure 3: Performance comparison across 4 endoscopic scenarios in OmniBrainBench among existing MLLMs.</p>
322322
</div>
323323
</div>
324-
</div>
324+
</div> -->
325325

326-
<!-- 第六个图 -->
326+
<!-- <!-- 第六个图 -->
327327
<div class="columns is-centered has-text-centered">
328328
<div class="column is-four-fifths">
329329
<h2 class="title is-4">Performance comparison across five different visual prompts.</h2>
@@ -332,7 +332,7 @@ <h2 class="title is-4">Performance comparison across five different visual promp
332332
<p>Figure 4: Performance comparison across 5 different visual prompts in OmniBrainBench among existing MLLMs.</p>
333333
</div>
334334
</div>
335-
</div>
335+
</div> -->
336336

337337
</div>
338338
</section>

0 commit comments

Comments
 (0)