Skip to content

docs: draft of readme #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 75 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,75 @@
# itbench-leaderboard
Code repository for leaderboard as part of ITBench
# ITBench leaderboard

This repository powers the community leaderboard for [ITBench](https://github.com/IBM/ITBench), an open benchmark for evaluating AI agents on real-world IT automation tasks.

The leaderboard showcases how different agents perform across officially defined scenarios spanning Site Reliability Engineering (SRE), Financial Operations (FinOps), and Compliance/Security (CISO) domains.

## 🏆 About the leaderboard

The leaderboard is automatically updated from submitted results and displays performance metrics for each agent, such as:

- Percentage of scenarios successfully completed
- Scenario-specific scores
- Runtime or efficiency metrics (if applicable)

For details on the benchmarks, environment setup, and submitting your agent to the leaderboard, refer to the [ITBench main repo](https://github.com/IBM/ITBench).

## 📊 Sample leaderboard (placeholder)

| Rank | Agent Name | Overall Score | SRE | FinOps | CISO | Notes |
|------|------------------------|---------------|-----|--------|------|---------------------------|
| 1 | Baseline SRE Agent | 85% | ✅✅✅✅✅✅ | N/A | N/A | Solved all SRE scenarios |
| 2 | Baseline CISO Agent | 80% | N/A | N/A | ✅✅✅ | High compliance coverage |
| 3 | Your Agent Here | TBD | TBD | TBD | TBD | Submit to find out |

> This is placeholder data. Actual results will be posted after public submissions open.

---

<!-- ## 📤 Submitting your results

Ready to submit your agent? Follow these steps:

1. **Run the official ITBench scenarios**
Use [ITBench-Scenarios](https://github.com/IBM/ITBench-Scenarios) and run your agent (e.g., [SRE agent](https://github.com/IBM/itbench-sre-agent) or [CISO agent](https://github.com/IBM/itbench-ciso-caa-agent)) on each task.

2. **Collect your results**
Capture outcome data (e.g., scenario success/failure, logs, scores). You can use the utilities provided in [ITBench-Utilities](https://github.com/IBM/ITBench-Utilities) to format and summarize results.

3. **Fork this repository and submit a pull request**
Copy link
Collaborator

@rohanarora rohanarora Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why would they need to submit a pull request, Connor (@connor-leech).
cc: @yuji-watanabe-jp

Include:
- Your agent name and affiliation (if applicable)
- A link to your agent’s source repo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for CISO? For SRE this is optional.
@yuji-watanabe-jp @yana1205.

- A structured summary of your results (in `data/` or `leaderboard.md`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for CISO? For SRE this is optional.
@yuji-watanabe-jp @yana1205.

- Any logs or metadata that aid verification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for CISO? For SRE this is optional.
@yuji-watanabe-jp @yana1205.


4. **Verification**
The ITBench team may re-run your agent on selected scenarios. Verified submissions will be added to the public leaderboard.

--- -->

<!--
## 📁 Repository structure

```
.
├── data/ # Submitted results and metadata
├── leaderboard.md # Markdown version of the public leaderboard
├── scripts/ # (Optional) Helper scripts for formatting or validation
└── README.md
```
-->

## 📄 License

This project is licensed under the [Apache License 2.0](LICENSE).

## 🙋 Contributing

To improve the leaderboard infrastructure or submission process:

- Open an issue to propose enhancements
- Submit a pull request for code, formatting improvements, or submission guidelines
- Follow the [contribution guidelines in the main ITBench repo](https://github.com/IBM/ITBench)

By contributing, you agree to license your work under Apache 2.0 and follow the ITBench project’s code of conduct.