Skip to content
Change the repository type filter

All

    Repositories list

    • MoDA

      Public
      An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
      28610Updated Mar 17, 2026Mar 17, 2026
    • Senna

      Public
      Bridging Large Vision-Language Models and End-to-End Autonomous Driving
      Python
      Apache License 2.0
      41538280Updated Mar 15, 2026Mar 15, 2026
    • MobileI2V

      Public
      [ArXiv 2025] MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
      Python
      Apache License 2.0
      37210Updated Mar 12, 2026Mar 12, 2026
    • Spa3R

      Public
      Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning
      Python
      MIT License
      14400Updated Mar 6, 2026Mar 6, 2026
    • InfiniteVL

      Public
      InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
      Python
      Apache License 2.0
      49520Updated Feb 2, 2026Feb 2, 2026
    • VAD

      Public
      [ICCV 2023 & ICLR 2026] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
      Python
      Apache License 2.0
      1461.3k761Updated Jan 31, 2026Jan 31, 2026
    • VGT

      Public
      Visual Generation Tuning
      Python
      MIT License
      09920Updated Jan 27, 2026Jan 27, 2026
    • GaussTR

      Public
      [CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
      Python
      MIT License
      1220810Updated Jan 5, 2026Jan 5, 2026
    • DiffusionDriveV2

      Public
      DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving
      Python
      MIT License
      22264102Updated Dec 29, 2025Dec 29, 2025
    • SuperCLIP

      Public
      Python
      Apache License 2.0
      712630Updated Dec 26, 2025Dec 26, 2025
    • DiffusionVL

      Public
      [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
      Python
      Apache License 2.0
      813430Updated Dec 25, 2025Dec 25, 2025
    • TBCM

      Public
      Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs
      Python
      02110Updated Dec 16, 2025Dec 16, 2025
    • LightningDiT

      Public
      [CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
      Python
      MIT License
      541.4k191Updated Dec 16, 2025Dec 16, 2025
    • 4DLangVGGT

      Public
      Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”
      Python
      MIT License
      28230Updated Dec 10, 2025Dec 10, 2025
    • DiffusionDrive

      Public
      [CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
      Python
      MIT License
      1231.3k271Updated Dec 8, 2025Dec 8, 2025
    • EVA-X

      Public
      [Nature Portfolio, npj DigitalMed] EVA-X: A foundation model for general chest X-ray analysis with self-supervised learning
      Python
      149460Updated Dec 6, 2025Dec 6, 2025
    • MolSight

      Public
      [AAAI 2026] MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
      Python
      Apache License 2.0
      21700Updated Dec 5, 2025Dec 5, 2025
    • LENS

      Public
      [AAAI 2026 Oral] LENS: Learning to Segment Anything with Unified Reinforced Reasoning
      Python
      Apache License 2.0
      6109150Updated Dec 3, 2025Dec 3, 2025
    • Turbo-VAED

      Public
      [AAAI 2026] Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
      Python
      195120Updated Nov 30, 2025Nov 30, 2025
    • RAD

      Public
      [NeurIPS 2025] RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
      Python
      MIT License
      319260Updated Nov 7, 2025Nov 7, 2025
    • MaskAdapter

      Public
      [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"
      Python
      Apache License 2.0
      412520Updated Oct 23, 2025Oct 23, 2025
    • hustvl.github.io

      Public
      HTML
      BSD 3-Clause "New" or "Revised" License
      2100Updated Oct 11, 2025Oct 11, 2025
    • TOGS

      Public
      [IEEE JBHI] The official code of "TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering"
      Python
      13310Updated Sep 10, 2025Sep 10, 2025
    • simpleseg

      Public
      Python
      0830Updated Sep 9, 2025Sep 9, 2025
    • Snap-Snap

      Public
      The repository of "Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds"
      Python
      23950Updated Sep 1, 2025Sep 1, 2025
    • recogdrive

      Public
      ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
      Python
      Apache License 2.0
      54700Updated Aug 21, 2025Aug 21, 2025
    • ViTMatte

      Public
      [Information Fusion (Vol.103, Mar. '24)] Boosting Image Matting with Pretrained Plain Vision Transformers
      Python
      MIT License
      47511193Updated Aug 13, 2025Aug 13, 2025
    • Dynamic-2DGS

      Public
      [ACM MM 2025] Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects
      Python
      Apache License 2.0
      617530Updated Aug 6, 2025Aug 6, 2025
    • .github

      Public
      0000Updated Jul 4, 2025Jul 4, 2025
    • [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
      Python
      17520Updated Jun 26, 2025Jun 26, 2025