This is the official repository for CTFTiny from "Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark" (AAAI'26) [paper].
For CTFJudge, please refer to CTFJudge Official Repository.
| Category | Event | Name | Difficulty |
|---|---|---|---|
| cry | 2017f | ecxor | Hard |
| cry | 2017f | lupin | Hard |
| cry | 2018q | babycrypto | Easy |
| cry | 2019q | super_curve | Easy |
| cry | 2020f | hybrid2 | Very Easy |
| cry | 2020q | perfect_secrecy | Hard |
| cry | 2021f | collision_course | Moderate |
| cry | 2022q | beyond_quantum | Hard |
| cry | 2022f | open_ellipti_ph | Hard |
| cry | 2022f | polly_crack_this | Moderate |
| cry | 2022f | the_lengths_we_extend_ourselves | Hard |
| cry | 2023f | describeme | Moderate |
| for | 2018q | whyos | Hard |
| for | 2023q | 1black0white | Easy |
| pwn | 2017q | pilot | Hard |
| pwn | 2018q | bigboy | Moderate |
| pwn | 2018q | get_it | Moderate |
| pwn | 2019q | baby_boi | Hard |
| pwn | 2019q | got_milk | Hard |
| pwn | 2020q | roppity | Hard |
| pwn | 2020q | slithery | Hard |
| pwn | 2021q | password_checker | Moderate |
| pwn | 2023q | puffin | Very Easy |
| pwn | 2023q | target_practice | Easy |
| pwn | 2023q | unlimited_subway | Hard |
| rev | 2017q | tablez | Moderate |
| rev | 2018q | a_walk_through_x86_part_2 | Hard |
| rev | 2019q | gibberish_check | Hard |
| rev | 2019q | beleaf | Moderate |
| rev | 2020f | rap | Easy |
| rev | 2020f | sourcery | Moderate |
| rev | 2020q | baby_mult | Moderate |
| rev | 2020q | ezbreezy | Moderate |
| rev | 2021f | maze | Hard |
| rev | 2021q | checker | Very Easy |
| rev | 2022q | dockerleakage | Easy |
| rev | 2022q | the_big_bang | Hard |
| rev | 2023f | unvirtualization | Very Easy |
| rev | 2023q | rox | Hard |
| rev | 2023q | rebug_2 | Moderate |
| rev | 2023q | whataxor | Very Easy |
| web | 2021q | poem_collection | Easy |
| web | 2023f | shreeramquest | Hard |
| web | 2023q | smug_dino | Easy |
| msc | 2018f | showdown | Very Easy |
| msc | 2022q | quantum_leap | Hard |
| msc | 2018q | algebra | Hard |
| msc | 2021q | weak_password | Easy |
| msc | 2022q | ezmaze | Easy |
| msc | 2023q | android_dropper | Easy |
CTFTiny follows the same benchmark structure as NYU CTF Bench. Please refer to NYU CTF Bench for detailed usage.