Skip to content
Change the repository type filter

All

    Repositories list

    • gift-eval

      Public
      Jupyter Notebook
      5717071Updated Dec 29, 2025Dec 29, 2025
    • Python
      0000Updated Dec 26, 2025Dec 26, 2025
    • MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents
      Python
      6653069Updated Dec 23, 2025Dec 23, 2025
    • FOFPred

      Public
      Python
      0000Updated Dec 18, 2025Dec 18, 2025
    • CoAct-1

      Public
      CoAct-1: Computer-using Agents with Coding as Actions
      Python
      21210Updated Dec 11, 2025Dec 11, 2025
    • SCUBA

      Public
      SCUBA: Salesforce Computer Use Benchmark
      Python
      0611Updated Dec 9, 2025Dec 9, 2025
    • SalesSim

      Public
      Python
      0100Updated Dec 3, 2025Dec 3, 2025
    • NuRL

      Public
      Python
      0000Updated Nov 26, 2025Nov 26, 2025
    • LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
      Python
      11310Updated Nov 19, 2025Nov 19, 2025
    • LoCoBench

      Public
      Python
      43240Updated Nov 19, 2025Nov 19, 2025
    • Salesforce Enterprise Deep Research
      Python
      1641k10Updated Nov 19, 2025Nov 19, 2025
    • ConvoMem

      Public
      Scala
      0500Updated Nov 18, 2025Nov 18, 2025
    • LiveResearchBench

      Public
      A live benchmark and evaluation framework for open-ended deep research in the wild.
      Python
      1010200Updated Nov 13, 2025Nov 13, 2025
    • Python
      0000Updated Nov 13, 2025Nov 13, 2025
    • swecomm

      Public
      Shell
      62800Updated Nov 10, 2025Nov 10, 2025
    • perfcodegen

      Public
      Python
      24310Updated Nov 10, 2025Nov 10, 2025
    • Python
      0210Updated Nov 10, 2025Nov 10, 2025
    • FoFo

      Public
      Python
      32820Updated Nov 10, 2025Nov 10, 2025
    • AgentLite

      Public
      Jupyter Notebook
      83641122Updated Nov 10, 2025Nov 10, 2025
    • FaithEval

      Public
      65610Updated Nov 10, 2025Nov 10, 2025
    • GemFilter

      Public
      Python
      98510Updated Nov 10, 2025Nov 10, 2025
    • Code for "Diffusion Model Alignment Using Direct Preference Optimization"
      Python
      45631181Updated Nov 10, 2025Nov 10, 2025
    • visual-unit-testing

      Public
      Jupyter Notebook
      01020Updated Nov 10, 2025Nov 10, 2025
    • C++
      12310Updated Nov 10, 2025Nov 10, 2025
    • INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
      Python
      01310Updated Nov 10, 2025Nov 10, 2025
    • bootpig

      Public
      Jupyter Notebook
      01120Updated Nov 10, 2025Nov 10, 2025
    • uni2ts

      Public
      Unified Training of Universal Time Series Forecasting Transformers
      Jupyter Notebook
      1791.4k401Updated Nov 10, 2025Nov 10, 2025
    • SFR-RAG

      Public
      Python
      78130Updated Nov 10, 2025Nov 10, 2025
    • CodeChain

      Public
      Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"
      Python
      44840Updated Nov 10, 2025Nov 10, 2025
    • CRMArena

      Public
      Official Repo for CRMArena and CRMArena-Pro
      Python
      2512752Updated Nov 9, 2025Nov 9, 2025