OpenGVLab repositories

GenExam

Public

GenExam: A Multidisciplinary Text-to-Image Exam

benchmark image-generation text-to-image-generation

Python

•

MIT License

•3•46•0•0•Updated

Nov 24, 2025

VideoChat-Flash

Public

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Python

•

MIT License

•14•485•10•0•Updated

Nov 18, 2025

Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high efficiency and throughput.

gpt language-model diffusion-modelsllm

Python

•

MIT License

•1•74•0•0•Updated

Nov 17, 2025

Vlaser

Public

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Python

•

MIT License

•0•32•2•0•Updated

Nov 7, 2025

MetaCaptioner

Public

Python

•3•39•1•0•Updated

Oct 31, 2025

SID-VLN

Public

Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Python

•

MIT License

•2•9•0•0•Updated

Oct 29, 2025

ExpVid

Public

0•6•0•0•Updated

Oct 28, 2025

VideoChat-R1

Public

[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning

Python

•9•229•20•0•Updated

Oct 18, 2025

NaViL

Public

Python

•

MIT License

•7•85•0•0•Updated

Oct 10, 2025

ScaleCUA

Public

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

data models gui-agentscomputer-use-agents scalecua online-evaluation-suite

Python

•

Apache License 2.0

•51•900•7•0•Updated

Oct 3, 2025

PonderV2

Public

[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

3d-vision pretraining foundation-models

Python

•

MIT License

•9•363•0•0•Updated

Sep 30, 2025

InternVL

Public

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modalsemantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b

Python

•

MIT License

•737•9.5k•279•5•Updated

Sep 22, 2025

EgoExoLearn

Public

[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset

Python

•

MIT License

•2•73•3•0•Updated

Aug 26, 2025

VRBench

Public

[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos

benchmark dataset video-understandingvlm evaluation-kit multi-step-reasoning video-reasoning llm

Python

•

Apache License 2.0

•0•21•0•0•Updated

Aug 8, 2025

InternVideo

Public

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understandingvideo-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering

Python

•

Apache License 2.0

•132•2.1k•132•3•Updated

Aug 7, 2025

PIIP

Public

[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)

computer-vision image-classification object-detectionsemantic-segmentation instance-segmentation vision-transformer multimodal-large-language-models vision-language-models

Python

•

MIT License

•5•105•2•0•Updated

Aug 5, 2025

GUI-Odyssey

Public

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 212 apps, and 1.4K app combos.

Python

•8•134•9•0•Updated

Aug 4, 2025

LORIS

Public

[ICML2023] Long-Term Rhythmic Video Soundtracker

music-generation pytorch-implementation multi-modalitydiffusion-models aigc

Python

•

MIT License

•1•61•1•0•Updated

Jul 28, 2025

TPO

Public

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Jupyter Notebook

•2•62•1•0•Updated

Jul 22, 2025

Docopilot

Public

[CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding

Python

•

MIT License

•1•35•2•0•Updated

Jul 22, 2025

Mono-InternVL

Public

[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Python

•

MIT License

•0•94•6•0•Updated

Jul 18, 2025

ZeroGUI

Public

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Python

•

Apache License 2.0

•7•101•0•0•Updated

Jul 17, 2025

MUTR

Public

「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Python

•

MIT License

•7•82•3•0•Updated

Jun 13, 2025

PVC

Public

[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Python

•

MIT License

•1•50•4•0•Updated

Jun 12, 2025

FluxViT

Public

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Python

•

MIT License

•0•34•1•0•Updated

Jun 11, 2025

VeBrain

Public

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

MIT License

•7•86•4•0•Updated

Jun 6, 2025

EfficientQAT

Public

[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python

•23•313•10•0•Updated

May 22, 2025

OmniQuant

Public

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

quantization large-language-models llm

Python

•

MIT License

•72•875•26•1•Updated

May 22, 2025

EgoVideo

Public

[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024

Jupyter Notebook

•4•132•9•0•Updated

May 11, 2025

OmniCorpus

Public

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python

•7•404•0•0•Updated

May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenGVLab

All

All

89 repositories

GenExam

VideoChat-Flash

SDLM

Vlaser

MetaCaptioner

SID-VLN

ExpVid

VideoChat-R1

NaViL

ScaleCUA

PonderV2

InternVL

EgoExoLearn

VRBench

InternVideo

PIIP

GUI-Odyssey

LORIS

TPO

Docopilot

Mono-InternVL

ZeroGUI

MUTR

PVC

FluxViT

VeBrain

EfficientQAT

OmniQuant

EgoVideo

OmniCorpus

All

All

Repositories list

89 repositories