ECCV-2024-Oral

2D Scene Understanding
3D Scene Understanding
NeRF/Gaussian
2D Generation
3D Generation
Human
Video
LLM/MLLM/VLM
Transformer
Diffusion

2D Scene Understanding

Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

Homepage : https://www.robots.ox.ac.uk/~vgg/research/ovdiff/
Paper : https://arxiv.org/abs/2306.09316

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Homepage : https://github.com/gpt4vision/OvSGTR/
Paper : https://arxiv.org/abs/2311.10988

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Homepage : https://github.com/xiuqhou/Relation-DETR
Paper : https://arxiv.org/abs/2407.11699

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Homepage : https://github.com/xjwu1024/WPS-SAM
Paper : https://arxiv.org/abs/2407.10131

Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

Homepage : https://vlislab22.github.io/Any2Seg/
Paper : https://arxiv.org/abs/2407.11351

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Paper : https://arxiv.org/abs/2404.10312

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Homepage : https://github.com/jiaosiyu1999/MAFT-Plus
Paper : https://arxiv.org/abs/2408.00744

ESA: Annotation-Efficient Active Learning for Semantic Segmentation

Homepage : https://github.com/jinchaogjc/ESA
Paper : https://arxiv.org/abs/2408.13491v1

Towards Scene Graph Anticipation

Homepage : https://github.com/rohithpeddi/SceneSayer/tree/main
Paper : https://arxiv.org/abs/2403.04899

Dataset Enhancement with Instance-Level Augmentations

Homepage : https://www.robots.ox.ac.uk/~vgg/research/instance-augmentation/
Paper : https://arxiv.org/abs/2406.08249

An Adaptive Correspondence Scoring Framework for Unsupervised Image Registration of Medical Images

Homepage : https://voldemort108x.github.io/AdaCS/
Paper : https://arxiv.org/abs/2312.00837

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Homepage : https://github.com/XiangZ-0/HiT-SR
Paper : https://arxiv.org/abs/2407.05878

Towards Open-ended Visual Quality Comparison

Homepage : https://co-instruct.github.io/
Paper : https://arxiv.org/abs/2402.16641

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Homepage : https://github.com/weihao1115/cat-sam
Paper : https://arxiv.org/abs/2402.03631

A Fair Ranking and New Model for Panoptic Scene Graph Generation

Paper : https://arxiv.org/pdf/2407.09216

Parrot Captions Teach CLIP to Spot Text

Homepage : https://linyq17.github.io/CLIP-Parrot-Bias/
Paper : https://arxiv.org/abs/2312.14232

On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines

Homepage : https://github.com/fiveai/detection_calibration
Paper : https://arxiv.org/abs/2405.20459

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

Homepage : https://github.com/mqraitem/From-Fake-to-Real
Paper : https://arxiv.org/pdf/2308.04553

SINDER: Repairing the Singular Defects of DINOv2

Homepage : https://github.com/haoqiwang/sinder
Paper : https://arxiv.org/abs/2407.16826

Emergent Visual-Semantic Hierarchies in Image-Text Representations

Homepage : https://tau-vailab.github.io/hierarcaps/
Paper : https://arxiv.org/abs/2407.08521

AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation

Homepage : https://github.com/RogerQi/AlignDiff
Paper : https://motion.cs.illinois.edu/papers/ECCV2024-Qiu-AlignDiff.pdf

3D Scene Understanding

OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects

Homepage : https://omninocs.github.io/
Paper : https://arxiv.org/abs/2407.08711

PointLLM: Empowering Large Language Models to Understand Point Clouds

Homepage : https://runsenxu.com/projects/PointLLM/
Paper : https://arxiv.org/abs/2308.16911

Bi-directional Contextual Attention for 3D Dense Captioning

Paper : https://arxiv.org/abs/2408.06662

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Homepage : https://ashmrz.github.io/WatchYourSteps/
Paper : https://arxiv.org/abs/2308.08947

Scene Coordinate Reconstruction

Homepage : https://nianticlabs.github.io/acezero/
Paper : https://arxiv.org/abs/2404.14351

HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation

Homepage : https://github.com/tpzou/HGL
Paper : https://arxiv.org/abs/2407.12387

RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentatio

Homepage : https://github.com/cszyzhang/RISurConv
Paper : https://arxiv.org/abs/2408.06110

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

Homepage : https://github.com/l1997i/Rapid_Seg#
Paper : https://arxiv.org/abs/2407.10159

Grounding Image Matching in 3D with MASt3R

Homepage : https://github.com/naver/mast3r
Paper : https://arxiv.org/abs/2312.14132

Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration

Paper : https://arxiv.org/abs/2407.08729

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Homepage : https://fraunhoferhhi.github.io/spvloc/
Paper : https://arxiv.org/abs/2404.10527

NeRF / Gaussian

Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering

Homepage : https://anttwo.github.io/frosting/
Paper : https://arxiv.org/abs/2403.14554

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Homepage : https://donydchen.github.io/mvsplat/
Paper : https://arxiv.org/abs/2403.14627

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Homepage : https://github.com/GATECH-EIC/Omni-Recon
Paper : https://arxiv.org/abs/2403.11131

RaFE: Generative Radiance Fields Restoration

Homepage : https://zkaiwu.github.io/RaFE/
Paper : https://arxiv.org/abs/2404.03654

Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration

Homepage : https://lzhnb.github.io/project-pages/analytic-splatting/
Paper : https://arxiv.org/abs/2403.11056

FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information

Homepage : https://jiangwenpl.github.io/FisherRF/
Paper : https://arxiv.org/abs/2311.17874

2D Generation

Adversarial Diffusion Distillation

Homepage : https://github.com/Stability-AI/generative-models
Paper : https://arxiv.org/abs/2311.17042

Adversarial Robustification via Text-to-Image Diffusion Models

Homepage : https://github.com/ChoiDae1/robustify-T2I
Paper : https://arxiv.org/abs/2407.18658

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Homepage : https://jingyechen.github.io/textdiffuser2/
Paper : https://arxiv.org/abs/2311.16465

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Homepage : https://doubiiu.github.io/projects/DynamiCrafter/
Paper : https://arxiv.org/abs/2310.12190

Accelerating Image Generation with Sub-path Linear Approximation Model

Homepage : https://subpath-linear-approx-model.github.io/
Paper : https://arxiv.org/abs/2404.13903

LLMGA: Multimodal Large Language Model based Generation Assistant

Homepage : https://llmga.github.io/
Paper : https://arxiv.org/abs/2311.16500

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Homepage : https://bolinlai.github.io/Lego_EgoActGen/
Paper : https://arxiv.org/abs/2312.03849

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Homepage : https://gen-l-2.github.io/

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Homepage : https://vision.cs.utexas.edu/projects/action2sound/
Paper : https://arxiv.org/abs/2406.09272

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Homepage : https://huggingface.co/papers/2401.05675
Paper : https://arxiv.org/abs/2401.05675

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

Homepage : https://github.com/chkimmmmm/R.A.C.E
Paper : https://arxiv.org/abs/2405.16341

SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

Homepage : https://kailinli.github.io/SemGrasp/
Paper : https://arxiv.org/abs/2404.03590

3D Generation

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Homepage : https://me.kiui.moe/lgm/
Paper : https://arxiv.org/abs/2402.05054

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Homepage : https://flashtex.github.io/
Paper : https://arxiv.org/abs/2402.13251

Pyramid Diffusion for Fine 3D Large Scene Generation

Homepage : https://github.com/yuhengliu02/pyramid-discrete-diffusion
Paper : https://arxiv.org/abs/2311.12085

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

Homepage : https://arking1995.github.io/ContextLayout/
Paper : https://arxiv.org/abs/2407.11294

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Paper : https://www.arxiv.org/abs/2408.12443

Human

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Homepage : https://ggxxii.github.io/texdreamer/
Paper : https://arxiv.org/abs/2403.12906

Controllable Human-Object Interaction Synthesis

Homepage : https://lijiaman.github.io/projects/chois/
Paper : https://arxiv.org/abs/2312.03913

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

Homepage : https://zikaihuangscut.github.io/Beat-It/
Paper : https://arxiv.org/abs/2407.07554

Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

Homepage : https://snuvclab.github.io/coma/
Paper : https://arxiv.org/abs/2401.12978

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

Homepage : https://github.com/FangyunWei/SLRT/tree/main/Spoken2Sign
Paper : https://arxiv.org/abs/2401.04730

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Homepage : https://guanjz20.github.io/projects/ReSyncer/
Paper : https://arxiv.org/abs/2408.03284

Sapiens: Foundation for Human Vision Models

Homepage : https://about.meta.com/realitylabs/codecavatars/sapiens/
Paper : https://www.arxiv.org/abs/2408.12569

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Homepage : https://arc2face.github.io/
Paper : https://arxiv.org/abs/2403.11641

Video

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Homepage : https://physdreamer.github.io/
Paper : https://arxiv.org/abs/2404.13026

Audio-Synchronized Visual Animation

Homepage : https://lzhangbj.github.io/projects/asva/asva.html
Paper : https://arxiv.org/abs/2403.05659

LongVLM: Efficient Long Video Understanding via Large Language Models

Homepage : https://github.com/ziplab/LongVLM
Paper : https://arxiv.org/abs/2404.03384

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

Homepage : https://vislearn.github.io/ControlNet-XS/
Paper : https://arxiv.org/abs/2312.06573

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

Homepage : https://remysabathier.github.io/animalavatar.github.io/
Paper : https://arxiv.org/abs/2403.17103

E3M: Zero-Shot Spatio-Temporal Video Grounding

Homepage : https://github.com/baopj/E3M?tab=readme-ov-file#e3m-zero-shot-spatio-temporal-video-grounding
Paper : https://baopj.github.io/files/ECCV24_E3M_ZeroSTVG.pdf

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Homepage : https://jinsingsangsung.github.io/ClassificationMatters/
Paper : https://arxiv.org/abs/2407.19698

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

Homepage : https://daveishan.github.io/avr-webpage/
Paper : https://www.crcv.ucf.edu/wp-content/uploads/2018/11/avr_eccv24_dave.pdf

ActionVOS: Actions as Prompts for Video Object Segmentation

Homepage : https://github.com/ut-vision/ActionVOS
Paper : https://arxiv.org/abs/2407.07402

DEVIAS: Learning Disentangled Video Representations of Action and Scene

Homepage : https://github.com/KHU-VLL/DEVIAS
Paper : https://arxiv.org/abs/2312.00826

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Homepage : https://showlab.github.io/MotionDirector/
Paper : https://arxiv.org/abs/2310.08465

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

Homepage : https://github.com/charigyang/made2order
Paper : https://arxiv.org/abs/2404.16828

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Homepage : https://sv3d.github.io/
Paper : https://arxiv.org/abs/2403.12008

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

Homepage : https://zzh-tech.github.io/InterpAny-Clearer/
Paper : https://arxiv.org/abs/2311.08007

Video Editing via Factorized Diffusion Distillation

Homepage : https://fdd-video-edit.github.io/
Paper : https://arxiv.org/abs/2403.09334

Towards Neuro-Symbolic Video Understanding

Homepage : https://utaustin-swarmlab.github.io/nsvs-project-page.github.io/
Paper : https://arxiv.org/abs/2403.11021

LLM / MLLM / VLM

MMBench: Is Your Multi-modal Model an All-around Player?

Homepage : https://github.com/open-compass/MMBench
Paper : https://arxiv.org/abs/2307.06281

BRAVE: Broadening the visual encoding of vision-language models

Homepage : https://brave-vlms.epfl.ch/
Paper : https://arxiv.org/abs/2404.07204

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Homepage : https://omniview-tuning.github.io/
Paper : https://arxiv.org/abs/2404.12139

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Homepage : https://github.com/pkunlp-icler/FastV
Paper : https://arxiv.org/abs/2403.06764

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Homepage : https://arxiv.org/abs/2403.09792
Paper : https://github.com/AoiDragon/HADES

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Paper : https://arxiv.org/abs/2403.08730

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models

Paper : https://arxiv.org/abs/2312.07408

Towards Goal-oriented Large Language Model Prompting: A Survey

Paper : https://arxiv.org/abs/2401.14043v1

Transformer

Denoising Vision Transformers

Homepage : https://jiawei-yang.github.io/DenoisingViT/
Paper : https://arxiv.org/abs/2401.02957

Diffusion

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

Homepage : https://cs-people.bu.edu/vpetsiuk/arc/
Paper : https://arxiv.org/abs/2404.13706

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ECCV-2024-Oral

2D Scene Understanding

3D Scene Understanding

NeRF / Gaussian

2D Generation

3D Generation

Human

Video

LLM / MLLM / VLM

Transformer

Diffusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

ECCV-2024-Oral

2D Scene Understanding

3D Scene Understanding

NeRF / Gaussian

2D Generation

3D Generation

Human

Video

LLM / MLLM / VLM

Transformer

Diffusion