Change the repository type filter
All
Repositories list
46 repositories
SearchAgentService
PublicSWE-bench-server
Publicopencompass
PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets…VLMEvalKit
PublicOpen-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarksTextEdit
PublicGenEditEvalKit
PublicGTA
Public[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool AgentsMiroFlow
PublicRePro
Public[ICLR 2026] Rectifying LLM Thought From Lens of OptimizationSAGA
PublicATLAS
PublicOASIS
PublicInteractScience
PublicCognitiveKernel-Pro
PublicGAOKAO-Eval
Public.github
PublicMMBench-GUI
PublicOfficial repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical mann…ReasonZoo
PublicCompassVerifier
PublicGPassK
Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?Creation-MMBench
PublicCompassJudger
PublicRaML
PublicBotChat
PublicAda-LEval
PublicThe official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"MathBench
PublicMMBench
PublicProSA
PublicANAH
Publicoc_doc_website
Public
ProTip! When viewing an organization's repositories, you can use the
props. filter to filter by custom property.