CVPR 2025 论文和开源项目合集(Papers with Code)

CVPR 2025 decisions are now available on OpenReview！22.1% = 2878 / 13008

注1：欢迎各位大佬提交issue，分享CVPR 2025论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

ICCV 2025

ECCV 2024

CVPR 2024

欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！

【CVPR 2025 论文开源目录】

3DGS(Gaussian Splatting)
Agent)
Avatars
Backbone
CLIPEVOS
Mamba
Embodied AI
GAN
GNN
多模态大语言模型(MLLM)
大语言模型(LLM)
NAS
OCR
NeRF
DETR
扩散模型(Diffusion Models)
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
异常检测(Anomaly Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
自动驾驶(Autonomous Driving)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
3D Visual Grounding(3D视觉定位)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
3D生成(3D Generation)
视频理解(Video Understanding)
行为检测(Action Detection)
具身智能(Embodied AI)
文本检测(Text Detection)
知识蒸馏(Knowledge Distillation)
模型剪枝(Model Pruning)
图像压缩(Image Compression)
三维重建(3D Reconstruction)
深度估计(Depth Estimation)
轨迹预测(Trajectory Prediction)
车道线检测(Lane Detection)
图像描述(Image Captioning)
视觉问答(Visual Question Answering)
手语识别(Sign Language Recognition)
视频预测(Video Prediction)
新视点合成(Novel View Synthesis)
Zero-Shot Learning(零样本学习)
立体匹配(Stereo Matching)
特征匹配(Feature Matching)
暗光图像增强(Low-light Image Enhancement)
场景图生成(Scene Graph Generation)
风格迁移(Style Transfer)
隐式神经表示(Implicit Neural Representations)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
压缩感知(Compressive Sensing)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

3DGS(Gaussian Splatting)

Agent

SpiritSight Agent: Advanced GUI Agent with One Look

Paper: https://arxiv.org/abs/2503.03196
Code: https://hzhiyuan.github.io/SpiritSight-Agent

Avatars

Backbone

Building Vision Models upon Heat Conduction

Paper: https://arxiv.org/abs/2405.16555
Code: https://github.com/MzeroMiko/vHeat

LSNet: See Large, Focus Small

Paper: https://arxiv.org/abs/2503.23135
Code: https://github.com/jameslahm/lsnet

CLIP

Mamba

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper: https://arxiv.org/abs/2407.08083
Code: https://github.com/NVlabs/MambaVision

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Paper: https://arxiv.org/abs/2411.15941
Code: https://github.com/lewandofskee/MobileMamba

MambaIC: State Space Models for High-Performance Learned Image Compression

Paper: https://arxiv.org/abs/2503.12461
Code: https://arxiv.org/abs/2503.12461

Embodied AI

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Project: https://ai4ce.github.io/CityWalker/
Paper: https://arxiv.org/abs/2411.17820
Code: https://github.com/ai4ce/CityWalker

GAN

OCR

NeRF

DETR

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Paper: https://arxiv.org/abs/2412.10028
Code: https://github.com/Visual-AI/Mr.DETR

Prompt

多模态大语言模型(MLLM)

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Paper： https://arxiv.org/abs/2412.01292
Code: https://github.com/Hoyyyaard/LSceneLLM

DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution

Paper: https://arxiv.org/abs/2405.16071
Code: https://github.com/callsys/DynRefer

Retrieval-Augmented Personalization for Multimodal Large Language Models

Project Page: https://hoar012.github.io/RAP-Project/
Paper: https://arxiv.org/abs/2410.13360
Code: https://github.com/Hoar012/RAP-MLLM

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Paper: https://arxiv.org/abs/2411.15232
Code: https://github.com/HealthX-Lab/BiomedCoOp

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Paper: https://arxiv.org/abs/2412.04317
Code: https://github.com/codefanw/FlashSloth

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Paper: https://arxiv.org/abs/2503.08497
Code: https://github.com/yunncheng/MMRL

PAVE: Patching and Adapting Video Large Language Models

Paper: https://arxiv.org/abs/2503.19794
Code: https://github.com/dragonlzm/PAVE

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Paper: https://arxiv.org/abs/2503.23733
Code: https://github.com/THUNLP-MT/AdaMMS

大语言模型(LLM)

NAS

ReID(重识别)

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Paper: https://arxiv.org/abs/2503.00938
Code: https://github.com/yuanc3/Pose2ID

AirRoom: Objects Matter in Room Reidentification

Project: https://sairlab.org/airroom/
Paper: https://arxiv.org/abs/2503.01130

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Paper: https://arxiv.org/abs/2503.10324
Code: https://github.com/924973292/IDEA

扩散模型(Diffusion Models)

TinyFusion: Diffusion Transformers Learned Shallow

Paper: https://arxiv.org/abs/2412.01199
Code: https://github.com/VainF/TinyFusion

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Paper: https://arxiv.org/abs/2409.03550
Code: https://github.com/qianlong0502/DKDM

Tiled Diffusion

Homepage: https://madaror.github.io/tiled-diffusion.github.io/
Paper: https://arxiv.org/abs/2412.15185
Code: https://github.com/madaror/tiled-diffusion

Vision Transformer

视觉和语言(Vision-Language)

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Paper: https://arxiv.org/abs/2412.01256
Code: https://github.com/qunovo/NLPrompt

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability

Paper: https://arxiv.org/abs/2503.08481
Code: https://github.com/unira-zwj/PhysVLM

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Paper: https://arxiv.org/abs/2503.08497
Code: https://github.com/yunncheng/MMRL

目标检测(Object Detection)

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Paper: https://arxiv.org/abs/2501.18954
Code：https://github.com/iSEE-Laboratory/LLMDet

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Paper: https://arxiv.org/abs/2412.10028
Code: https://github.com/Visual-AI/Mr.DETR

异常检测(Anomaly Detection)

目标跟踪(Object Tracking)

Multiple Object Tracking as ID Prediction

Paper：https://arxiv.org/abs/2403.16848
Code: https://github.com/MCG-NJU/MOTIP

Omnidirectional Multi-Object Tracking

Paper:https://arxiv.org/abs/2503.04565
Code:https://github.com/xifen523/OmniTrack

医学图像(Medical Image)

BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis

Paper: https://arxiv.org/abs/2410.10604
Code: https://github.com/shaohao011/BrainMVP

医学图像分割(Medical Image Segmentation)

Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation

Paper: https://arxiv.org/abs/2503.13012
Code: https://github.com/Yore0/TTDG-MGM

自动驾驶(Autonomous Driving)

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Project: https://ldkong.com/LiMoE
Paper: https://arxiv.org/abs/2501.04004
Code: https://github.com/Xiangxu-0103/LiMoE

3D点云(3D-Point-Cloud)

Unlocking Generalization Power in LiDAR Point Cloud Registration

Paper: https://arxiv.org/abs/2503.10149
Code: https://github.com/peakpang/UGP

3D目标检测(3D Object Detection)

3D语义分割(3D Semantic Segmentation)

Low-level Vision

超分辨率(Super-Resolution)

AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution

去噪(Denoising)

图像去噪(Image Denoising)

3D人体姿态估计(3D Human Pose Estimation)

Reconstructing Humans with a Biomechanically Accurate Skeleton

Homepage: https://isshikihugh.github.io/HSMR/
Code: https://github.com/IsshikiHugh/HSMR

#3D Visual Grounding(3D视觉定位)

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

Homepage: https://pqh22.github.io/projects/ProxyTransformation/index.html
Code: https://github.com/pqh22/ProxyTransformation
Paper: https://arxiv.org/abs/2502.19247

图像生成(Image Generation)

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper: https://arxiv.org/abs/2501.01423
Code: https://github.com/hustvl/LightningDiT

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Paper: https://arxiv.org/abs/2412.04852
Code: https://github.com/taco-group/SleeperMark

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Homepage: https://byteflow-ai.github.io/TokenFlow/
Code: https://github.com/ByteFlow-AI/TokenFlow
Paper:https://arxiv.org/abs/2412.03069

PAR: Parallelized Autoregressive Visual Generation

Project: https://epiphqny.github.io/PAR-project/
Paper: https://arxiv.org/abs/2412.15119
Code: https://github.com/Epiphqny/PAR

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Project: https://generative-photography.github.io/project/
Paper: https://arxiv.org/abs/2412.02168
Code: https://github.com/pandayuanyu/generative-photography

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Project Page: https://opening-benchmark.github.io/
Paper: https://arxiv.org/abs/2411.18499).
Code: https://github.com/LanceZPF/OpenING

视频生成(Video Generation)

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Paper: https://arxiv.org/abs/2411.17440
Code: https://github.com/PKU-YuanGroup/ConsisID

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper: https://arxiv.org/abs/2407.15642
Code: https://github.com/maxin-cn/Cinemo

X-Dyna: Expressive Dynamic Human Image Animation

Paper: https://arxiv.org/abs/2501.10021
Code: https://github.com/bytedance/X-Dyna

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Paper: https://arxiv.org/pdf/2412.00596
Code: https://github.com/pittisl/PhyT2V

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Project: https://liewfeng.github.io/TeaCache/
Paper: https://arxiv.org/abs/2411.19108
Code: https://github.com/ali-vilab/TeaCache

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Project: https://iva-mzsun.github.io/AR-Diffusion
Paper: https://arxiv.org/abs/2503.07418
Code: https://github.com/iva-mzsun/AR-Diffusion

图像编辑(Image Editing)

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Paper: https://arxiv.org/abs/2411.16832
Code: https://github.com/taco-group/FaceLock

h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform

Paper: https://arxiv.org/abs/2503.02187
Code: https://github.com/nktoan/h-edit

视频编辑(Video Editing)

3D生成(3D Generation)

Generative Gaussian Splatting for Unbounded 3D City Generation

Project: https://haozhexie.com/project/gaussian-city
Paper: https://arxiv.org/abs/2406.06526
Code: https://github.com/hzxie/GaussianCity

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

3D重建(3D Reconstruction)

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Project: https://fast3r-3d.github.io/
Paper: https://arxiv.org/abs/2501.13928

人体运动生成(Human Motion Generation)

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

Project: https://4dvlab.github.io/project_page/semgeomo/
Paper: https://arxiv.org/abs/2503.01291
https://github.com/4DVLab/SemGeoMo

视频理解(Video Understanding)

Temporal Grounding Videos like Flipping Manga

Paper: https://arxiv.org/abs/2411.10332
Code: https://github.com/yongliang-wu/NumPro

具身智能(Embodied AI)

Universal Actions for Enhanced Embodied Foundation Models

Project: https://2toinf.github.io/UniAct/
Paper: https://arxiv.org/abs/2501.10105
Code: https://github.com/2toinf/UniAct

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability

Paper: https://arxiv.org/abs/2503.08481
Code: https://github.com/unira-zwj/PhysVLM

知识蒸馏(Knowledge Distillation)

深度估计(Depth Estimation)

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Project: https://depthcrafter.github.io
Paper: https://arxiv.org/abs/2409.02095
Code: https://github.com/Tencent/DepthCrafter

MonSter: Marry Monodepth to Stereo Unleashes Power

Paper: https://arxiv.org/abs/2501.08643
Code: https://github.com/Junda24/MonSter

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Project: https://insta360-research-team.github.io/DEFOM-Stereo/
Paper: https://arxiv.org/abs/2501.09466
Code: https://github.com/Insta360-Research-Team/DEFOM-Stereo

立体匹配(Stereo Matching)

MonSter: Marry Monodepth to Stereo Unleashes Power

Paper: https://arxiv.org/abs/2501.08643
Code: https://github.com/Junda24/MonSter

暗光图像增强(Low-light Image Enhancement)

HVI: A New color space for Low-light Image Enhancement

ReDDiT: Efficient Diffusion as Low Light Enhancer

Paper: https://arxiv.org/abs/2410.12346
Code: https://github.com/lgz-0713/ReDDiT

图像压缩(Image Compression)](#IC)

MambaIC: State Space Models for High-Performance Learned Image Compression

Paper: https://arxiv.org/abs/2503.12461
Code: https://arxiv.org/abs/2503.12461

场景图生成(Scene Graph Generation)

风格迁移(Style Transfer)

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Project: https://stylestudio-official.github.io/
Paper: https://arxiv.org/abs/2412.08503
Code: https://github.com/Westlake-AGI-Lab/StyleStudio

图像质量评价(Image Quality Assessment)

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Homepage: https://yichengchen24.github.io/projects/autocherrypicker
Paper: https://arxiv.org/pdf/2406.20085
Code: https://github.com/yichengchen24/ACP

视频质量评价(Video Quality Assessment)

压缩感知(Compressive Sensing)

Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing

Paper: https://arxiv.org/abs/2503.08429
Code: https://github.com/FengodChen/DMP-DUN-CVPR2025

数据集(Datasets)

Objaverse++: Curated 3D Object Dataset with Quality Annotations

Paper: https://arxiv.org/abs/2504.07334
Code: https://github.com/TCXX/ObjaversePlusPlus

其他(Others)

DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry

Paper: https://arxiv.org/abs/2503.13110
Code: https://github.com/jinli99/DTGBrepGen

Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation

Paper: https://arxiv.org/abs/2503.19307
Code: https://github.com/delaprada/HandSynthesis.git

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

Homepage: https://weixiang-zhang.github.io/proj-evos/
Paper: https://arxiv.org/abs/2412.10153
Code: https://github.com/zwx-open/EVOS-INR

Name		Name	Last commit message	Last commit date
Latest commit History 664 Commits
CVPR2019-Papers-with-Code.md		CVPR2019-Papers-with-Code.md
CVPR2020-Papers-with-Code.md		CVPR2020-Papers-with-Code.md
CVPR2021-Papers-with-Code.md		CVPR2021-Papers-with-Code.md
CVPR2022-Papers-with-Code.md		CVPR2022-Papers-with-Code.md
CVPR2023-Papers-with-Code.md		CVPR2023-Papers-with-Code.md
CVPR2024-Papers-with-Code.md		CVPR2024-Papers-with-Code.md
CVer学术交流群.png		CVer学术交流群.png
README.md		README.md
master		master

amusi/CVPR2025-Papers-with-Code

Folders and files

Latest commit

History

Repository files navigation