Open Trajectory Gym is a pipeline for post-training security LLMs on CTF challenge trajectories. It combines TRL for supervised fine-tuning, SkyRL for online reinforcement learning with live tool execution, and GEPA for prompt evolution -- producing locally deployable security agents from open-weight models.
Before we can merge your first pull request, you need to sign the Contributor License Agreement. The CLA bot will comment on your PR with instructions — you sign by posting a single comment, once, and the bot remembers you for every future PR.
If you are contributing on behalf of an employer, please confirm with them that you have authorization to sign before doing so. For corporate-wide agreements, contact the maintainers.
git clone https://github.com/westonbrown/open-trajectory-gym.git
cd open-trajectory-gym
pip install -e ".[dev]"For training stages, install the relevant extras:
pip install -e ".[sft]" # Stage 1: TRL SFT
pip install -e ".[online_rl]" # Stage 2: SkyRL ONLINE_RL
pip install -e ".[gepa]" # Stage 3: GEPA prompt evolutionpytest tests/All tests should pass without a GPU. Tests that require a GPU or running challenge containers are skipped automatically.
We use ruff for linting and formatting:
ruff check .
ruff format .The project has three training stages, each backed by a dedicated framework:
| Stage | Framework | Purpose |
|---|---|---|
| SFT | TRL | Supervised fine-tuning on expert CTF traces |
| ONLINE_RL | SkyRL | Online reinforcement learning with live tool execution |
| GEPA | DSPy + GEPA | Prompt evolution (no weight updates) |
The ToolExecutor provides 13 tools (shell, Python, file ops, flag submission) via direct subprocess execution. During online ONLINE_RL, the model generates tool calls that execute against live Docker containers — no HTTP server required.
- Create a training config at
examples/<model>/training.yaml. - Configure the TRL SFT parameters and SkyRL ONLINE_RL parameters.
- If the model uses a non-standard chat template, add a formatter in
src/trajgym/formatters/. - Test with the validation pipeline:
trajgym-validate.
See existing configs (e.g., examples/qwen35-27b/training.yaml) for reference.
- Add challenge entries to a YAML registry in
configs/challenges/(docker or static type, with ID, flag, and difficulty). - Create ONLINE_RL training data with
ground_truth_flagfields pointing to your challenges. - Pass the registry via
--challenge-registry configs/challenges/<name>.yaml.
No changes to the reward function, tool definitions, or training loop are needed.
- Fork the repository and create a feature branch from
main. - Make your changes with clear, descriptive commits.
- Run
pytest tests/andruff check .to verify nothing is broken. - Open a PR against
mainwith a description of what changed and why.
The Project is distributed under the MIT License. Contributors grant rights to the Project through the Contributor License Agreement, which authorizes the Project to distribute Contributions under MIT.