Skip to content
Change the repository type filter

All

    Repositories list

    • Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal
      Python
      Apache License 2.0
      87159864Updated Jun 27, 2026Jun 27, 2026
    • harbor

      Public
      Framework for evaluating and improving agents
      Python
      Apache License 2.0
      1.2k2.8k137296Updated Jun 27, 2026Jun 27, 2026
    • Python
      Apache License 2.0
      13700Updated Jun 27, 2026Jun 27, 2026
    • Measuring agents' ability to get work done on a computer
      Python
      3092561095Updated Jun 27, 2026Jun 27, 2026
    • benchmark-template

      Public template
      Harbor Benchmark Template
      Python
      101387Updated Jun 25, 2026Jun 25, 2026
    • TypeScript
      15921Updated Jun 22, 2026Jun 22, 2026
    • Shell
      31702Updated Jun 18, 2026Jun 18, 2026
    • docs

      Public
      MDX
      MIT License
      0000Updated Jun 3, 2026Jun 3, 2026
    • A curated list of awesome Harbor ecosystem projects
      24301Updated May 29, 2026May 29, 2026
    • 11635622Updated May 16, 2026May 16, 2026
    • skills

      Public
      Public agent skills catalog for Harbor
      Apache License 2.0
      110111Updated May 12, 2026May 12, 2026
    • Terminal-Bench 2.1
      Shell
      Apache License 2.0
      62917Updated May 5, 2026May 5, 2026
    • Shell
      Apache License 2.0
      893051721Updated Apr 30, 2026Apr 30, 2026
    • Realistic examples of building evals and optimizing agents with Harbor
      Python
      Apache License 2.0
      1011501Updated Apr 23, 2026Apr 23, 2026
    • MDX
      11304Updated Mar 31, 2026Mar 31, 2026
    • A benchmark for LLMs on complicated tasks in the terminal
      Python
      Apache License 2.0
      5482.4k113196Updated Jan 22, 2026Jan 22, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.