BradyFU · GewelsJI · May 6, 2025 · May 6, 2025
diff --git a/README.md b/README.md
@@ -145,6 +145,7 @@ This is the first work to correct hallucination in multimodal large language mod
 | ![Star](https://img.shields.io/github/stars/inst-it/inst-it.svg?style=social&label=Star) <br> [**Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning**](https://arxiv.org/pdf/2412.03565) <br> | arXiv | 2024-12-04 | [Github](https://github.com/inst-it/inst-it) | - |
 | ![Star](https://img.shields.io/github/stars/TimeMarker-LLM/TimeMarker.svg?style=social&label=Star) <br> [**TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability**](https://arxiv.org/pdf/2411.18211) <br> | arXiv | 2024-11-27 | [Github](https://github.com/TimeMarker-LLM/TimeMarker/) | - |
 | ![Star](https://img.shields.io/github/stars/IDEA-Research/ChatRex.svg?style=social&label=Star) <br> [**ChatRex: Taming Multimodal LLM for Joint Perception and Understanding**](https://arxiv.org/pdf/2411.18363) <br> | arXiv | 2024-11-27 | [Github](https://github.com/IDEA-Research/ChatRex) | Local Demo | 
+| ![Star](https://img.shields.io/github/stars/ai4colonoscopy/IntelliScope.svg?style=social&label=Star) <br> [**[ColonGPT] Frontiers in Intelligent Colonoscopy**](https://arxiv.org/abs/2410.17241) <br> | arXiv | 2024-10-22 | [Github](https://github.com/ai4colonoscopy/IntelliScope) | Local Demo |
 | ![Star](https://img.shields.io/github/stars/Vision-CAIR/LongVU.svg?style=social&label=Star) <br> [**LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding**](https://arxiv.org/pdf/2410.17434) <br> | arXiv | 2024-10-22 | [Github](https://github.com/Vision-CAIR/LongVU) | [Demo](https://huggingface.co/spaces/Vision-CAIR/LongVU) |
 | ![Star](https://img.shields.io/github/stars/shikiw/Modality-Integration-Rate.svg?style=social&label=Star) <br> [**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**](https://arxiv.org/pdf/2410.07167) <br> | arXiv | 2024-10-09 | [Github](https://github.com/shikiw/Modality-Integration-Rate) | - |
 | ![Star](https://img.shields.io/github/stars/rese1f/aurora.svg?style=social&label=Star) <br> [**AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark**](https://arxiv.org/pdf/2410.03051) <br> | arXiv | 2024-10-04 | [Github](https://github.com/rese1f/aurora) | Local Demo |
@@ -595,6 +596,7 @@ This is the first work to correct hallucination in multimodal large language mod
 ## Datasets of Multimodal Instruction Tuning
 | Name | Paper | Link | Notes |
 |:-----|:-----:|:----:|:-----:|
+| **ColonINST** | [Frontiers in Intelligent Colonoscopy](https://arxiv.org/abs/2410.17241) | [Link](https://github.com/ai4colonoscopy/IntelliScope) | A medical multimodal instruction tuning dataset (62 categories, 300K+ colonoscopy images, 450K+ tuning pairs) |
 | **Inst-IT Dataset** | [Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning](https://arxiv.org/pdf/2412.03565) | [Link](https://github.com/inst-it/inst-it) | An instruction-tuning dataset which contains fine-grained multi-level annotations for 21k videos and 51k images |
 | **E.T. Instruct 164K** | [E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding](https://arxiv.org/pdf/2409.18111) | [Link](https://github.com/PolyU-ChenLab/ETBench) | An instruction-tuning dataset for time-sensitive video understanding |
 | **MSQA** | [Multi-modal Situated Reasoning in 3D Scenes](https://arxiv.org/pdf/2409.02389) | [Link](https://msr3d.github.io/) | A large scale dataset for multi-modal situated reasoning in 3D scenes |