Skip to content

IBM/ITBench-Leaderboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

46 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ITBench-Leaderboard

๐ŸŒŸ Explore the Leaderboards

Domain Leaderboard
๐Ÿ” CISO ๐Ÿ‘‰ View CISO Leaderboard
โš™๏ธ SRE ๐Ÿ‘‰ View SRE Leaderboard

What Is ITBench?

Measure the performance of your AI agent(s) across a wide variety of complex and real-life IT automation tasks targetting three key use cases:

  • Site Reliability Engineering (SRE): focusing on availability and resiliency
  • Financial Operations (FinOps): focusing on enforcing cost efficiencies and optimizing return on investment
  • Compliance and Security Operations (CISO): focusing on ensuring compliance and security of IT implementations

This is a public leaderboard. ITBench handles the deployment of the environments and scenarios, and it evaluates the submissions made by the agent.

Key Terminologies

  • Scenario: ITBench incorporates a collection of problems that we call "scenarios." For example, one of the SRE scenarios in ITBench is to resolve a โ€œHigh error rate on service checkoutโ€ in a Kubernetes environment. Another scenario that is relevant for the CISO use case involves assessing the compliance posture for a โ€œnew control rule detected for RHEL 9.โ€
  • Environment: Each of the ITBench scenarios are deployed in an operational sandboxed Kubernetes environment.
  • Benchmark: Collection of scenarios that are excuted parallel or in sequence but independent of each other. An agent makes a submission to address. diagnose, or remediate the scenario at hand.

Getting Started

Prerequisites

  • A private GitHub repository
    • A file facilitating the agent and leaderboard handshake is pushed to this private repository.
    • The file(s) may be created or deleted automatically during the benchmark lifecycle.
  • A Kubernetes sandbox cluster (KinD recommended) -- Only needed for CISO
    • Do not use a production cluster, because the benchmark process will create and delete resources dynamically.
    • Please refer to prepare-kubeconfig-kind.md
  • An agent to benchmark
    • A base agent is available from IBM for immediate use. The base agent for the CISO use case can be found here, and one for SRE and FinOps use cases can be found [here]. This allows you to leverage your methodologies and make improvements without having to worry about interactions between the agent and leaderboard service.

Setup

Step 1. Install the ITBench GitHub App

Install the ibm-itbench GitHub app into the private GitHub repository (see Prerequisites).

  1. Go to the installation page here.

    go-to-github-app
  2. Select your GitHub Organization.

    select-org
  3. Select your Agent configuration repo.

    select-repo

Step 2. Register your agent

In this step, you will register your agent information with ITBench.

  1. Create a new registration issue.
  2. Fill in the issue template with the following information:
    • Agent Name: Your agent name

    • Agent Level: "Beginner"

    • Agent Scenarios: "Kubernetes in Kyverno"

    • Config Repo: URL for your agent configuration repo (You may adjust the settings depending on the scenarios or agent level.)

      agent-registration-fill
  3. Submit the issue.
  • Click "Create" to submit your registration request.

  • Once your request is approved:

    • An approved label will be attached to your issue.

    • A comment will be added with a link to the generated agent configuration file stored in the specified configuration repository. Download the linked configuration file to proceed.

      agent-registration-done
  • If you subscribe to the issue, you will also receive email notifications.

    agent-registration-email

If there are any problems with your submission, we will respond directly on the issue. If you do not receive any response within a couple of days, please reach out to the maintainers.

Step 3. Create a benchmark request

In this step, you will register your benchmark entry.

  1. Create a new benchmark issue.
  2. Fill in the issue template.
    • The name for the Config Repo must match the repository you used during agent registration.

      image
  3. Submit the issue.
    • Click "Create" to submit your registration request. Once your request is approved:

      • An approved label will be attached to your issue.

      • The issue comment will be updated with your Benchmark ID.

        image
    • If you subscribe to the issue, you will also receive email notifications.

      image

If there are any problems with your submission, we will respond directly on the issue. If you do not receive any response within a couple of days, please reach out to the maintainers.

Running your agent or our base agent against the benchmark

You can run either your own custom agent or one of our built-in agents against the ITBench benchmark.

The following guides and videos demonstrate how to run the benchmark using our built-in agents. These may also serve as helpful references when setting up your own agent:

ITBench Ecosystem and Related Repositories

  • ITBench: Central repository providing an overview of the ITBench ecosystem, related announcements, and publications.
  • CISO-CAA Agent: CISO (Chief Information Security Officer) agents that automate compliance assessments by generating policies from natural language, collecting evidence, integrating with GitOps workflows, and deploying policies for assessment.
  • SRE Agent: SRE (Site Reliability Engineering) agents designed to diagnose and remediate problems in Kubernetes-based environments. Leverage logs, metrics, traces, and Kubernetes states/events from the IT enviroment.
  • ITBench Scenarios: Environment setup and mechanism to trigger scenarios.
  • ITBench Utilities: Collection of supporting tools and utilities for participants in the ITBench ecosystem and leaderboard challenges.
  • ITBench Tutorials: Repository containing the latest tutorials, workshops, and educational content for getting started with ITBench.

Maintainers

About

Code repository for leaderboard as part of ITBench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages