You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide covers the breaking changes introduced in TRL v1 and how to update your code. Most structural changes (trainers moved to experimental, removed model classes, etc.) already shipped in v0.29 — if you're already on v0.29, this migration is minimal.
Changed defaults
Config
Parameter
v0 default
v1 default
Action needed
GRPOConfig
vllm_mode
"server"
"colocate"
If you use use_vllm=True without specifying vllm_mode, vLLM will now run in the same process instead of connecting to a separate server. Set vllm_mode="server" explicitly if you rely on server mode.
RLOOConfig
vllm_mode
"server"
"colocate"
Same as above.
Already changed in v0.29
The following changes were introduced in v0.29 and are not new in v1. They are listed here for completeness if you are migrating from an earlier version.
Trainers moved to experimental
Several trainers were moved from the stable API to trl.experimental. They are no longer importable from trl directly (except KTO, which still has a compatibility shim with a deprecation warning).
Trainer
New import
PPO
from trl.experimental.ppo import PPOTrainer, PPOConfig
CPO
from trl.experimental.cpo import CPOTrainer, CPOConfig
BCO
from trl.experimental.bco import BCOTrainer, BCOConfig
ORPO
from trl.experimental.orpo import ORPOTrainer, ORPOConfig
XPO
from trl.experimental.xpo import XPOTrainer, XPOConfig
Online DPO
from trl.experimental.online_dpo import OnlineDPOTrainer, OnlineDPOConfig
GKD
from trl.experimental.gkd import GKDTrainer, GKDConfig
Nash-MD
from trl.experimental.nash_md import NashMDTrainer, NashMDConfig
PRM
from trl.experimental.prm import PRMTrainer, PRMConfig
KTO
from trl.experimental.kto import KTOTrainer, KTOConfig
Removed model classes
Class
New location
AutoModelForCausalLMWithValueHead
trl.experimental.ppo
AutoModelForSeq2SeqLMWithValueHead
trl.experimental.ppo
PreTrainedModelWrapper
trl.experimental.ppo
Removed callbacks and utilities
What
New location
WinRateCallback
trl.experimental.winrate_callback
Judges
trl.experimental.judges
peft_module_casting_to_bf16
trl.experimental.utils
FDivergenceType enum
Removed. Use string values ("reverse_kl", "js_divergence", "alpha_divergence") directly.