All

63 repositories

VLM2Vec
Public
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
benchmark representation-learning image-retrieval
benchmark representation-learning image-retrieval embedding vlm multimodal rag video-retrieval contrastive-learning mmeb
Python
•
Apache License 2.0
•60•656•29•0•Updated Jun 17, 2026Jun 17, 2026
ClawBench
Public
Open-source benchmark for browser AI agents on daily tasks.
chrome-extension benchmark evaluation
chrome-extension benchmark evaluation dataset browser-automation ai-agents web-agent web-agents everyday-tasks browser-agent
Python
•
Apache License 2.0
•23•406•40•1•Updated Jun 14, 2026Jun 14, 2026
StructEval
Public
Evaluating LLMs' abilities to generate structural output [TMLR2025]
benchmark structure evaluation
benchmark structure evaluation
Python
•
Apache License 2.0
•5•23•1•1•Updated Jun 12, 2026Jun 12, 2026
OpenResearcher
Public
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
retrieval llm deep-research
retrieval llm deep-research
Python
•79•786•1•0•Updated Jun 10, 2026Jun 10, 2026
verl-tool
Public
A version of verl to support diverse tool use [TMLR 2026]
agent learning reinforcement
agent learning reinforcement llm
Python
•
MIT License
•83•1k•8•0•Updated Jun 8, 2026Jun 8, 2026
RationalRewards
Public
RationalRewards: a reasoning reward model for diffusion RL and test-time prompt tuning
Python
•4•62•1•0•Updated Jun 4, 2026Jun 4, 2026
Pixel-Reasoner
Public
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
reasoning multimodal llm
reasoning multimodal llm
Python
•
MIT License
•13•299•4•0•Updated Jun 4, 2026Jun 4, 2026
VideoEval-Pro
Public
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
video evaluation understanding
video evaluation understanding multimodal
Python
•0•17•0•0•Updated Jun 1, 2026Jun 1, 2026
SWE-Next
Public
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
agent qa swe
agent qa swe llm
Python
•
Apache License 2.0
•2•3•0•0•Updated May 28, 2026May 28, 2026
VisPhyWorld
Public
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
physics visual worldmodel
physics visual worldmodel llm
Python
•
MIT License
•0•7•0•0•Updated May 28, 2026May 28, 2026
SWE-QA-Pro
Public
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding [ACL 2026]
qa benchmark llm
qa benchmark llm
Python
•1•6•0•0•Updated May 19, 2026May 19, 2026
RewardHarness
Public
Self-evolving agentic reward framework for image-editing evaluation — 47.4% on EditReward-Bench from only 100 preference demos, no reward-model training. arXiv …
image-editing gemini vlm
image-editing gemini vlm preference-learning rlhf reward-model agentic qwen-vl self-evolving
Python
•
Apache License 2.0
•1•13•1•0•Updated May 18, 2026May 18, 2026
EditReward
Public
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]
evaluation editing diffusion
evaluation editing diffusion
Python
•
MIT License
•6•152•4•0•Updated Apr 11, 2026Apr 11, 2026
Critique-Coder
Public
Training Coder Models with Critique Reinforcement Learning [ICLR26]
coder rl llm
coder rl llm
Python
•
MIT License
•0•14•0•0•Updated Apr 11, 2026Apr 11, 2026
Hierarchical-Reasoner
Public
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
rl reasoning llm
rl reasoning llm
Python
•
MIT License
•3•65•0•0•Updated Apr 11, 2026Apr 11, 2026
ImagenWorld
Public
Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]
image generation genai
image generation genai
Python
•
MIT License
•1•33•0•0•Updated Apr 2, 2026Apr 2, 2026
MMLU-Pro
Public
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
evaluation llm
evaluation llm
Python
•
Apache License 2.0
•54•393•9•0•Updated Mar 18, 2026Mar 18, 2026
EvolveCoder
Public
EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning
coder agents llm
coder agents llm
Python
•
MIT License
•0•1•0•0•Updated Mar 16, 2026Mar 16, 2026
MMMU
Public
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Python
•
Apache License 2.0
•54•0•0•0•Updated Feb 12, 2026Feb 12, 2026
Context-Forcing
Public
Consistent Autoregressive Video Generation with Long Context
2•88•2•0•Updated Feb 6, 2026Feb 6, 2026
VisualWebInstruct
Public
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]
vlm llm
vlm llm
Python
•
MIT License
•2•40•0•0•Updated Feb 1, 2026Feb 1, 2026
VisCoder2
Public
The official code of "VisCoder2: Building Multi-Language Visualization Coding Agents" [ICLR26]
visualization coder llm
visualization coder llm
Python
•0•9•0•0•Updated Jan 28, 2026Jan 28, 2026
BrowserAgent
Public
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions [TMLR2025]
agent browser llm
agent browser llm
Python
•
MIT License
•4•34•0•0•Updated Jan 13, 2026Jan 13, 2026
Mantis
Public
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
language video vision
language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3
Python
•
Apache License 2.0
•23•240•4•1•Updated Jan 3, 2026Jan 3, 2026
VideoScore2
Public
Automatic Metric for Evaluating Generated Videos
videos vlm llm
videos vlm llm
Python
•
MIT License
•4•49•6•0•Updated Dec 8, 2025Dec 8, 2025
VideoScore
Public
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
language machine-learning
language machine-learning
Python
•
MIT License
•5•121•3•0•Updated Dec 4, 2025Dec 4, 2025
ImagenHub
Public
A one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]
deep-learning pytorch image-editing
deep-learning pytorch image-editing generative-art image-generation diffusion-models stable-diffusion generative-ai
Python
•
MIT License
•18•180•2•0•Updated Dec 2, 2025Dec 2, 2025
General-Reasoner
Public
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
rl reasoning llm
rl reasoning llm
Python
•15•228•2•0•Updated Nov 27, 2025Nov 27, 2025
QuickCodec
Public
A More Efficient Video Codec [TMLR2025]
video codec
video codec
Cython
•
BSD 3-Clause "New" or "Revised" License
•0•8•0•0•Updated Nov 2, 2025Nov 2, 2025
QuickVideo
Public
Quick Long Video Understanding [TMLR2025]
video multimodal-learning multimodal
video multimodal-learning multimodal llm
Python
•
MIT License
•6•78•2•0•Updated Oct 27, 2025Oct 27, 2025

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TIGER Lab

All

All

63 repositories

VLM2Vec

ClawBench

StructEval

OpenResearcher

verl-tool

RationalRewards

Pixel-Reasoner

VideoEval-Pro

SWE-Next

VisPhyWorld

SWE-QA-Pro

RewardHarness

EditReward

Critique-Coder

Hierarchical-Reasoner

ImagenWorld

MMLU-Pro

EvolveCoder

MMMU

Context-Forcing

VisualWebInstruct

VisCoder2

BrowserAgent

Mantis

VideoScore2

VideoScore

ImagenHub

General-Reasoner

QuickCodec

QuickVideo

All

All

Repositories list

63 repositories