Change the repository type filter
All
Repositories list
63 repositories
SWE-Next
PublicVisPhyWorld
PublicVLM2Vec
PublicThis repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]ClawBench
PublicOpen-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.SWE-QA-Pro
PublicRewardHarness
PublicSelf-evolving agentic reward framework for image-editing evaluation — 47.4% on EditReward-Bench from only 100 preference demos, no reward-model training. arXiv …OpenResearcher
PublicOpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory SynthesisRationalRewards
PublicEditReward
PublicEditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]Critique-Coder
PublicHierarchical-Reasoner
PublicImagenWorld
PublicStress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]MMLU-Pro
PublicThe code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]EvolveCoder
Publicverl-tool
PublicMMMU
PublicContext-Forcing
PublicVisualWebInstruct
PublicVisCoder2
PublicThe official code of "VisCoder2: Building Multi-Language Visualization Coding Agents" [ICLR26]BrowserAgent
PublicStructEval
PublicEvaluating LLMs' abilities to generate structural output [TMLR2025]Mantis
PublicVideoScore2
PublicVideoScore
Publicofficial repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]ImagenHub
PublicA one-stop library to standardize the inference and evaluation of all the conditional image generation models. [ICLR 2024]General-Reasoner
PublicPixel-Reasoner
PublicPixel-Level Reasoning Model trained with RL [NeuIPS25]QuickCodec
PublicQuickVideo
PublicQuick Long Video Understanding [TMLR2025]VideoEval-Pro
PublicMore reliable Video Understanding Evaluation
ProTip! When viewing an organization's repositories, you can use the
props. filter to filter by custom property.