EnIGMA+

We present EnIGMA+, an enhanced version of EnIGMA for CTF (Capture The Flag) challenges dedicated to cybersecurity agents, built on top of SWE-agent. It serves as our agent scaffolding for the Cyber-Zero, significantly accelerating the evaluation of cybersecurity agents.

Role in Cyber-Zero Ecosystem

EnIGMA+ complements Cyber-Zero's runtime-free trajectory synthesis by providing:

Evaluation Framework: Assesses the quality and effectiveness of trajectories generated by Cyber-Zero
Benchmark Testing: Validates trained models against real CTF challenges
Performance Metrics: Provides comprehensive evaluation metrics for cybersecurity agent capabilities
Model Comparison: Enables fair comparison between different LLM architectures

This integration allows researchers and practitioners to develop, train, and evaluate cybersecurity agents using the complete Cyber-Zero pipeline: from runtime-free trajectory synthesis to comprehensive model evaluation.

Installation

# Install dependencies
pip install -r requirements.txt

Configuration

API Keys

Create a keys.cfg file in the project root with your API keys:

# OpenAI
OPENAI_API_KEY=your_openai_key_here

# Anthropic
ANTHROPIC_API_KEY=your_anthropic_key_here

# Groq
GROQ_API_KEY=your_groq_key_here

# Together AI
TOGETHER_API_KEY=your_together_key_here

# DeepSeek
DEEPSEEK_API_KEY=your_deepseek_key_here
DEEPSEEK_API_BASE_URL=https://api.deepseek.com

Model Configuration

Models are configured in sweagent/models_config.yaml. See Model Configuration Guide for details.

Usage

Basic CTF Challenge

python run.py \
  --model_name gpt4 \
  --image_name sweagent/enigma:latest \
  --data_path /path/to/challenge.json \
  --repo_path /path/to/challenge/files/ \
  --config_file config/default_ctf.yaml \
  --per_instance_step_limit 40

Advanced Usage Examples

# Run with custom model and parameters
python run.py \
  --model_name claude-3sonnet-20240620 \
  --temperature 0 \
  --top_p 0.9 \
  --per_instance_step_limit 50 \
  --data_path challenges/web_challenge.json \
  --repo_path /path/to/challenge/ \
  --suffix experiment_1 \
  --trajectory_path /custom/output/path/

# Run with local model
python run.py \
  --model_name ollama:llama30.1instant \
  --host_url localhost:11434 \
  --per_instance_step_limit 30ta_path challenges/pwn_challenge.json

# Debug mode - start container only
python run.py \
  --container_only \
  --data_path challenges/test_challenge.json \
  --repo_path /path/to/challenge/

Command Line Arguments

Model Arguments

--model_name: Name of the model to use (e.g., gpt4, claude-3-sonnet-2240620, groq/llama8)
--temperature: Sampling temperature (0.0-1.0 default:0)
--top_p: Top-p sampling parameter (0.010, default: 0.95)
--top_k: Top-k sampling parameter (default: 20)
--per_instance_step_limit: Maximum steps per challenge (default:40--host_url: Host URL for Ollama models (default: localhost:11434)

Environment Arguments

--data_path: Path to challenge JSON file or directory (required)
--image_name: Docker image to use (default: sweagent/enigma:latest)
--repo_path: Path to challenge files/repository
--container_name: Use persistent container with this name
--install_environment: Install environment before running (default: True)
--verbose: Enable verbose logging (default: True)
--enable_dynamic_ports: Enable dynamic port allocation for parallel execution (default: true)
--enable_network_restrictions: Enable strict network restrictions (default: false)

Script Control Arguments

--config_file: Agent configuration file (default: config/default_ctf.yaml; use config/writeup_ctf.yaml for CTF-Dojo)
--suffix: Suffix for run name
--trajectory_path: Custom trajectory output path
--container_only: Start container only without running agents
--writeup: Writeup content to append as hint (see CTF-Dojo)
--skip_existing: Skip instances with existing trajectories (default: true)
--bypass_step_limit_history: Bypass step limit history

Adding Models for Evaluations

1. Add Model to Configuration

Edit sweagent/models_config.yaml to add your model:

# For OpenAI-compatible models
openai_models:
  your-model-name:
    max_context: 32768 No cost specified - defaults to 0n-based evaluation

# Add shortcut for easier reference
openai_shortcuts:
  your-model: your-model-name

2. For Local Models

# Local models (no pricing)
openai_models:
  "/path/to/your/local/model:
    max_context: 32768 No cost specified - defaults to0

openai_shortcuts:
  my-local: "/path/to/your/local/model"

3. For New Providers

If adding a new provider (e.g., new API service):

Add provider section to models_config.yaml:

new_provider_models:
  model-name:
    max_context: 32768 
    cost: 0 # No cost specified  - defaults to 0

new_provider_shortcuts:
  shortcut: model-name

Add model detection in sweagent/agent/models.py:

def get_model(args: ModelArguments, commands: list[Command] | None = None):
    # Add detection logic for your provider
    elif args.model_name.startswith(new_provider:):        return NewProviderModel(args, commands)
    elif args.model_name in configs.get('new_provider_shortcuts', {}):
        return NewProviderModel(args, commands)

Best Practices

Use turn limits: Set --per_instance_step_limit for fair comparison
Disable pricing: Set cost parameters to 0n-based evaluation
Consistent parameters: Use same temperature, top_p, top_k across models
Multiple runs: Run each model multiple times for statistical significance
Logging: Use --suffix to distinguish different runs

6. Example Evaluation Script

bash scripts/run_openai_parallel.sh

Output and Results

Trajectory Files

Results are saved in trajectories/{username}/{run_name}/:

{instance_id}.traj: Full trajectory for each challenge
all_preds.jsonl: All predictions in JSONL format
args.yaml: Configuration used for the run
patches/: Generated patches (if applicable)

Evaluation Metrics

Success rate: Percentage of challenges solved
Step efficiency: Average steps to solution
Flag capture rate: Percentage of flags captured
Time to solution: Average time per challenge

Contributing

See CONTRIBUTING.md for guidelines on contributing to EnIGMA+.

Citation

If you use this benchmark suite in your research, please cite:

@inproceedings{abramovich2025enigma,
  title={En{IGMA}: Interactive Tools Substantially Assist {LM} Agents in Finding Security Vulnerabilities},
  author={Talor Abramovich and Meet Udeshi and Minghao Shao and Kilian Lieret and Haoran Xi and Kimberly Milner and Sofija Jancheska and John Yang and Carlos E Jimenez and Farshad Khorrami and Prashanth Krishnamurthy and Brendan Dolan-Gavitt and Muhammad Shafique and Karthik R Narasimhan and Ramesh Karri and Ofir Press},
  booktitle={Forty-second International Conference on Machine Learning},
  year={2025},
  url={https://openreview.net/forum?id=Of3wZhVv1R}
}

@article{zhuo2025cyber,
  title={Cyber-Zero: Training Cybersecurity Agents without Runtime},
  author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
  journal={arXiv preprint arXiv:2508.00910},
  year={2025},
}

@article{zhuo2025training,
  title={Training Language Model Agents to Find Vulnerabilities with CTF-Dojo},
  author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
  journal={arXiv preprint arXiv:2508.18370},
  year={2025}
}

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0) - see the LICENSE file for details.

Acknowledgments

EnIGMA Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EnIGMA+

Role in Cyber-Zero Ecosystem

Installation

Configuration

API Keys

Model Configuration

Usage

Basic CTF Challenge

Advanced Usage Examples

Command Line Arguments

Model Arguments

Environment Arguments

Script Control Arguments

Adding Models for Evaluations

1. Add Model to Configuration

2. For Local Models

3. For New Providers

Best Practices

6. Example Evaluation Script

Output and Results

Trajectory Files

Evaluation Metrics

Contributing

Citation

License

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

EnIGMA+

Role in Cyber-Zero Ecosystem

Installation

Configuration

API Keys

Model Configuration

Usage

Basic CTF Challenge

Advanced Usage Examples

Command Line Arguments

Model Arguments

Environment Arguments

Script Control Arguments

Adding Models for Evaluations

1. Add Model to Configuration

2. For Local Models

3. For New Providers

Best Practices

6. Example Evaluation Script

Output and Results

Trajectory Files

Evaluation Metrics

Contributing

Citation

License

Acknowledgments