diff --git a/README.md b/README.md
index f82bb53..0b4ca14 100644
--- a/README.md
+++ b/README.md
@@ -145,6 +145,7 @@ This is the first work to correct hallucination in multimodal large language mod
| 
[**Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning**](https://arxiv.org/pdf/2412.03565)
| arXiv | 2024-12-04 | [Github](https://github.com/inst-it/inst-it) | - |
| 
[**TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability**](https://arxiv.org/pdf/2411.18211)
| arXiv | 2024-11-27 | [Github](https://github.com/TimeMarker-LLM/TimeMarker/) | - |
| 
[**ChatRex: Taming Multimodal LLM for Joint Perception and Understanding**](https://arxiv.org/pdf/2411.18363)
| arXiv | 2024-11-27 | [Github](https://github.com/IDEA-Research/ChatRex) | Local Demo |
+| 
[**[ColonGPT] Frontiers in Intelligent Colonoscopy**](https://arxiv.org/abs/2410.17241)
| arXiv | 2024-10-22 | [Github](https://github.com/ai4colonoscopy/IntelliScope) | Local Demo |
| 
[**LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding**](https://arxiv.org/pdf/2410.17434)
| arXiv | 2024-10-22 | [Github](https://github.com/Vision-CAIR/LongVU) | [Demo](https://huggingface.co/spaces/Vision-CAIR/LongVU) |
| 
[**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**](https://arxiv.org/pdf/2410.07167)
| arXiv | 2024-10-09 | [Github](https://github.com/shikiw/Modality-Integration-Rate) | - |
| 
[**AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark**](https://arxiv.org/pdf/2410.03051)
| arXiv | 2024-10-04 | [Github](https://github.com/rese1f/aurora) | Local Demo |
@@ -595,6 +596,7 @@ This is the first work to correct hallucination in multimodal large language mod
## Datasets of Multimodal Instruction Tuning
| Name | Paper | Link | Notes |
|:-----|:-----:|:----:|:-----:|
+| **ColonINST** | [Frontiers in Intelligent Colonoscopy](https://arxiv.org/abs/2410.17241) | [Link](https://github.com/ai4colonoscopy/IntelliScope) | A medical multimodal instruction tuning dataset (62 categories, 300K+ colonoscopy images, 450K+ tuning pairs) |
| **Inst-IT Dataset** | [Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning](https://arxiv.org/pdf/2412.03565) | [Link](https://github.com/inst-it/inst-it) | An instruction-tuning dataset which contains fine-grained multi-level annotations for 21k videos and 51k images |
| **E.T. Instruct 164K** | [E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding](https://arxiv.org/pdf/2409.18111) | [Link](https://github.com/PolyU-ChenLab/ETBench) | An instruction-tuning dataset for time-sensitive video understanding |
| **MSQA** | [Multi-modal Situated Reasoning in 3D Scenes](https://arxiv.org/pdf/2409.02389) | [Link](https://msr3d.github.io/) | A large scale dataset for multi-modal situated reasoning in 3D scenes |