Skip to content
Change the repository type filter

All

    Repositories list

    • Adv-GRPO

      Public
      [CVPR 2026] An official implementation of Adv-GRPO. The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.
      Python
      MIT License
      17210Updated Feb 26, 2026Feb 26, 2026
    • Github repository for World-VLA-Loop.
      JavaScript
      1300Updated Feb 25, 2026Feb 25, 2026
    • A curated list of recent diffusion models for video generation, editing, and various other applications.
      3405.5k00Updated Feb 23, 2026Feb 23, 2026
    • videogui

      Public
      [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
      JavaScript
      35120Updated Feb 22, 2026Feb 22, 2026
    • [CVPR 2026] Official Implementation of Edit2Perceive
      Python
      03010Updated Feb 21, 2026Feb 21, 2026
    • Olaf-World

      Public
      Orienting Latent Actions for Video World Modeling
      MIT License
      07810Updated Feb 11, 2026Feb 11, 2026
    • FocusUI

      Public
      [CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
      Python
      12500Updated Feb 10, 2026Feb 10, 2026
    • World-VLA-Loop-Pages

      Public
      World-VLA-Loop Project Github Pages
      JavaScript
      0300Updated Feb 9, 2026Feb 9, 2026
    • showui-pi

      Public
      ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
      Python
      Apache License 2.0
      139430Updated Feb 6, 2026Feb 6, 2026
    • Q2A

      Public
      [ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
      Python
      52310Updated Jan 30, 2026Jan 30, 2026
    • D-AR

      Public
      the official repo for "D-AR: Diffusion via Autoregressive Models"
      Python
      MIT License
      213520Updated Jan 29, 2026Jan 29, 2026
    • Python
      Other
      12500Updated Jan 28, 2026Jan 28, 2026
    • DIM

      Public
      [ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
      Python
      Other
      02500Updated Jan 27, 2026Jan 27, 2026
    • T2F-Bench

      Public
      A comprehensive benchmark for evaluating text-to-film generation performance.
      0500Updated Jan 22, 2026Jan 22, 2026
    • Find out who said what in the video.
      Jupyter Notebook
      1613510Updated Jan 22, 2026Jan 22, 2026
    • ShowUI

      Public
      [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
      Python
      Apache License 2.0
      1311.7k150Updated Jan 20, 2026Jan 20, 2026
    • Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.
      Python
      Apache License 2.0
      3124040Updated Jan 20, 2026Jan 20, 2026
    • Mitty

      Public
      Official code implementation of "Mitty: Diffusion-based Human-to-Robot Video Generation"
      Python
      21620Updated Jan 14, 2026Jan 14, 2026
    • The website for aloha introduction
      HTML
      0000Updated Jan 13, 2026Jan 13, 2026
    • Show-o

      Public
      [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
      Python
      Apache License 2.0
      881.9k644Updated Jan 8, 2026Jan 8, 2026
    • SAM-I2VPP

      Public
      [TPAMI 2026] SAM-I2V++
      Jupyter Notebook
      Apache License 2.0
      0300Updated Jan 7, 2026Jan 7, 2026
    • SAM-I2V

      Public
      [CVPR 2025] SAM-I2V
      Jupyter Notebook
      Apache License 2.0
      13500Updated Jan 2, 2026Jan 2, 2026
    • Other
      24120Updated Dec 20, 2025Dec 20, 2025
    • RobotSeg

      Public
      Apache License 2.0
      03720Updated Dec 18, 2025Dec 18, 2025
    • ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score
      Python
      MIT License
      01200Updated Dec 17, 2025Dec 17, 2025
    • EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
      HTML
      17310Updated Dec 17, 2025Dec 17, 2025
    • OmniPSD

      Public
      Official code implementation of "OmniPSD: Layered PSD Generation with Diffusion Transformer"
      Other
      58730Updated Dec 13, 2025Dec 13, 2025
    • A V2V framework that translates human interaction videos into robot manipulation videos.
      12210Updated Dec 12, 2025Dec 12, 2025
    • Official Pytorch Code of the Paper "WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation"
      Python
      22231Updated Dec 6, 2025Dec 6, 2025
    • AUI

      Public
      Computer-Use Agents as Judges for Generative UI
      Python
      MIT License
      54310Updated Nov 27, 2025Nov 27, 2025