We present EnIGMA+, an enhanced version of EnIGMA for CTF (Capture The Flag) challenges dedicated to cybersecurity agents, built on top of SWE-agent. It serves as our agent scaffolding for the Cyber-Zero, significantly accelerating the evaluation of cybersecurity agents.
EnIGMA+ complements Cyber-Zero's runtime-free trajectory synthesis by providing:
- Evaluation Framework: Assesses the quality and effectiveness of trajectories generated by Cyber-Zero
- Benchmark Testing: Validates trained models against real CTF challenges
- Performance Metrics: Provides comprehensive evaluation metrics for cybersecurity agent capabilities
- Model Comparison: Enables fair comparison between different LLM architectures
This integration allows researchers and practitioners to develop, train, and evaluate cybersecurity agents using the complete Cyber-Zero pipeline: from runtime-free trajectory synthesis to comprehensive model evaluation.
# Install dependencies
pip install -r requirements.txtCreate a keys.cfg file in the project root with your API keys:
# OpenAI
OPENAI_API_KEY=your_openai_key_here
# Anthropic
ANTHROPIC_API_KEY=your_anthropic_key_here
# Groq
GROQ_API_KEY=your_groq_key_here
# Together AI
TOGETHER_API_KEY=your_together_key_here
# DeepSeek
DEEPSEEK_API_KEY=your_deepseek_key_here
DEEPSEEK_API_BASE_URL=https://api.deepseek.comModels are configured in sweagent/models_config.yaml. See Model Configuration Guide for details.
python run.py \
--model_name gpt4 \
--image_name sweagent/enigma:latest \
--data_path /path/to/challenge.json \
--repo_path /path/to/challenge/files/ \
--config_file config/default_ctf.yaml \
--per_instance_step_limit 40# Run with custom model and parameters
python run.py \
--model_name claude-3sonnet-20240620 \
--temperature 0 \
--top_p 0.9 \
--per_instance_step_limit 50 \
--data_path challenges/web_challenge.json \
--repo_path /path/to/challenge/ \
--suffix experiment_1 \
--trajectory_path /custom/output/path/
# Run with local model
python run.py \
--model_name ollama:llama30.1instant \
--host_url localhost:11434 \
--per_instance_step_limit 30ta_path challenges/pwn_challenge.json
# Debug mode - start container only
python run.py \
--container_only \
--data_path challenges/test_challenge.json \
--repo_path /path/to/challenge/--model_name: Name of the model to use (e.g.,gpt4,claude-3-sonnet-2240620,groq/llama8)--temperature: Sampling temperature (0.0-1.0 default:0)--top_p: Top-p sampling parameter (0.010, default: 0.95)--top_k: Top-k sampling parameter (default: 20)--per_instance_step_limit: Maximum steps per challenge (default:40--host_url: Host URL for Ollama models (default: localhost:11434)
--data_path: Path to challenge JSON file or directory (required)--image_name: Docker image to use (default: sweagent/enigma:latest)--repo_path: Path to challenge files/repository--container_name: Use persistent container with this name--install_environment: Install environment before running (default: True)--verbose: Enable verbose logging (default: True)--enable_dynamic_ports: Enable dynamic port allocation for parallel execution (default: true)--enable_network_restrictions: Enable strict network restrictions (default: false)
--config_file: Agent configuration file (default:config/default_ctf.yaml; useconfig/writeup_ctf.yamlfor CTF-Dojo)--suffix: Suffix for run name--trajectory_path: Custom trajectory output path--container_only: Start container only without running agents--writeup: Writeup content to append as hint (see CTF-Dojo)--skip_existing: Skip instances with existing trajectories (default: true)--bypass_step_limit_history: Bypass step limit history
Edit sweagent/models_config.yaml to add your model:
# For OpenAI-compatible models
openai_models:
your-model-name:
max_context: 32768 No cost specified - defaults to 0n-based evaluation
# Add shortcut for easier reference
openai_shortcuts:
your-model: your-model-name# Local models (no pricing)
openai_models:
"/path/to/your/local/model:
max_context: 32768 No cost specified - defaults to0
openai_shortcuts:
my-local: "/path/to/your/local/model"If adding a new provider (e.g., new API service):
- Add provider section to
models_config.yaml:
new_provider_models:
model-name:
max_context: 32768
cost: 0 # No cost specified - defaults to 0
new_provider_shortcuts:
shortcut: model-name- Add model detection in
sweagent/agent/models.py:
def get_model(args: ModelArguments, commands: list[Command] | None = None):
# Add detection logic for your provider
elif args.model_name.startswith(new_provider:): return NewProviderModel(args, commands)
elif args.model_name in configs.get('new_provider_shortcuts', {}):
return NewProviderModel(args, commands)- Use turn limits: Set
--per_instance_step_limitfor fair comparison - Disable pricing: Set cost parameters to 0n-based evaluation
- Consistent parameters: Use same temperature, top_p, top_k across models
- Multiple runs: Run each model multiple times for statistical significance
- Logging: Use
--suffixto distinguish different runs
bash scripts/run_openai_parallel.shResults are saved in trajectories/{username}/{run_name}/:
{instance_id}.traj: Full trajectory for each challengeall_preds.jsonl: All predictions in JSONL formatargs.yaml: Configuration used for the runpatches/: Generated patches (if applicable)
- Success rate: Percentage of challenges solved
- Step efficiency: Average steps to solution
- Flag capture rate: Percentage of flags captured
- Time to solution: Average time per challenge
See CONTRIBUTING.md for guidelines on contributing to EnIGMA+.
If you use this benchmark suite in your research, please cite:
@inproceedings{abramovich2025enigma,
title={En{IGMA}: Interactive Tools Substantially Assist {LM} Agents in Finding Security Vulnerabilities},
author={Talor Abramovich and Meet Udeshi and Minghao Shao and Kilian Lieret and Haoran Xi and Kimberly Milner and Sofija Jancheska and John Yang and Carlos E Jimenez and Farshad Khorrami and Prashanth Krishnamurthy and Brendan Dolan-Gavitt and Muhammad Shafique and Karthik R Narasimhan and Ramesh Karri and Ofir Press},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=Of3wZhVv1R}
}
@article{zhuo2025cyber,
title={Cyber-Zero: Training Cybersecurity Agents without Runtime},
author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
journal={arXiv preprint arXiv:2508.00910},
year={2025},
}
@article{zhuo2025training,
title={Training Language Model Agents to Find Vulnerabilities with CTF-Dojo},
author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
journal={arXiv preprint arXiv:2508.18370},
year={2025}
}This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0) - see the LICENSE file for details.