Skip to content

Latest commit

 

History

History
89 lines (61 loc) · 3.55 KB

File metadata and controls

89 lines (61 loc) · 3.55 KB

Contributing to Open Trajectory Gym

Open Trajectory Gym is a pipeline for post-training security LLMs on CTF challenge trajectories. It combines TRL for supervised fine-tuning, SkyRL for online reinforcement learning with live tool execution, and GEPA for prompt evolution -- producing locally deployable security agents from open-weight models.

Contributor License Agreement

Before we can merge your first pull request, you need to sign the Contributor License Agreement. The CLA bot will comment on your PR with instructions — you sign by posting a single comment, once, and the bot remembers you for every future PR.

If you are contributing on behalf of an employer, please confirm with them that you have authorization to sign before doing so. For corporate-wide agreements, contact the maintainers.

Development Setup

git clone https://github.com/westonbrown/open-trajectory-gym.git
cd open-trajectory-gym
pip install -e ".[dev]"

For training stages, install the relevant extras:

pip install -e ".[sft]"    # Stage 1: TRL SFT
pip install -e ".[online_rl]"   # Stage 2: SkyRL ONLINE_RL
pip install -e ".[gepa]"   # Stage 3: GEPA prompt evolution

Running Tests

pytest tests/

All tests should pass without a GPU. Tests that require a GPU or running challenge containers are skipped automatically.

Code Style

We use ruff for linting and formatting:

ruff check .
ruff format .

Architecture Overview

The project has three training stages, each backed by a dedicated framework:

Stage Framework Purpose
SFT TRL Supervised fine-tuning on expert CTF traces
ONLINE_RL SkyRL Online reinforcement learning with live tool execution
GEPA DSPy + GEPA Prompt evolution (no weight updates)

The ToolExecutor provides 13 tools (shell, Python, file ops, flag submission) via direct subprocess execution. During online ONLINE_RL, the model generates tool calls that execute against live Docker containers — no HTTP server required.

Adding a New Model

  1. Create a training config at examples/<model>/training.yaml.
  2. Configure the TRL SFT parameters and SkyRL ONLINE_RL parameters.
  3. If the model uses a non-standard chat template, add a formatter in src/trajgym/formatters/.
  4. Test with the validation pipeline: trajgym-validate.

See existing configs (e.g., examples/qwen35-27b/training.yaml) for reference.

Adding a New Benchmark

  1. Add challenge entries to a YAML registry in configs/challenges/ (docker or static type, with ID, flag, and difficulty).
  2. Create ONLINE_RL training data with ground_truth_flag fields pointing to your challenges.
  3. Pass the registry via --challenge-registry configs/challenges/<name>.yaml.

No changes to the reward function, tool definitions, or training loop are needed.

Pull Request Process

  1. Fork the repository and create a feature branch from main.
  2. Make your changes with clear, descriptive commits.
  3. Run pytest tests/ and ruff check . to verify nothing is broken.
  4. Open a PR against main with a description of what changed and why.

License

The Project is distributed under the MIT License. Contributors grant rights to the Project through the Contributor License Agreement, which authorizes the Project to distribute Contributions under MIT.