Skip to content

Commit aea9b10

Browse files
committed
bump trl to 0.29
1 parent fe928a9 commit aea9b10

File tree

13 files changed

+322
-36
lines changed

13 files changed

+322
-36
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ Running Environment:
145145
| modelscope | >=1.23 | | |
146146
| peft | >=0.11,<0.19 | | |
147147
| flash_attn | | 2.8.3/3.0.0b1 | |
148-
| trl | >=0.15,<0.29 | 0.28.0 | RLHF |
148+
| trl | >=0.15,<0.30 | 0.28.0 | RLHF |
149149
| deepspeed | >=0.14 | 0.18.8 | Training |
150150
| vllm | >=0.5.1 | 0.11.0/0.17.1 | Inference/Deployment |
151151
| sglang | >=0.4.6 | | Inference/Deployment |

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ uv pip install -e . --torch-backend=auto
141141
| modelscope | >=1.23 | | |
142142
| peft | >=0.11,<0.19 | | |
143143
| flash_attn | | 2.8.3/3.0.0b1 | |
144-
| trl | >=0.15,<0.29 | 0.28.0 | RLHF |
144+
| trl | >=0.15,<0.30 | 0.28.0 | RLHF |
145145
| deepspeed | >=0.14 | 0.18.8 | 训练 |
146146
| vllm | >=0.5.1 | 0.11.0/0.17.1 | 推理/部署 |
147147
| sglang | >=0.4.6 | | 推理/部署 |

docs/source/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
144144
| modelscope | >=1.23 | | |
145145
| peft | >=0.11,<0.19 | | |
146146
| flash_attn | | 2.8.3/3.0.0b1 | |
147-
| trl | >=0.15,<0.29 | 0.28.0 | RLHF |
147+
| trl | >=0.15,<0.30 | 0.28.0 | RLHF |
148148
| deepspeed | >=0.14 | 0.18.8 | 训练 |
149149
| vllm | >=0.5.1 | 0.11.0/0.17.1 | 推理/部署 |
150150
| sglang | >=0.4.6 | | 推理/部署 |

docs/source/Megatron-SWIFT/Quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
7373
| transformers | >=4.33 | 4.57.6/5.2.0 | |
7474
| modelscope | >=1.23 | | |
7575
| peft | >=0.11,<0.19 | | LoRA |
76-
| trl | >=0.15,<0.29 | | RLHF |
76+
| trl | >=0.15,<0.30 | | RLHF |
7777

7878

7979
## 快速入门案例

docs/source_en/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ More images can be found [here](https://modelscope.cn/docs/intro/environment-set
143143
| modelscope | >=1.23 | | |
144144
| peft | >=0.11,<0.19 | | |
145145
| flash_attn | | 2.8.3/3.0.0b1 | |
146-
| trl | >=0.15,<0.29 | 0.28.0 | RLHF |
146+
| trl | >=0.15,<0.30 | 0.28.0 | RLHF |
147147
| deepspeed | >=0.14 | 0.18.8 | Training |
148148
| vllm | >=0.5.1 | 0.11.0/0.17.1 | Inference/Deployment |
149149
| sglang | >=0.4.6 | | Inference/Deployment |

docs/source_en/Megatron-SWIFT/Quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Recommended Operating Environment:
7373
| transformers | >=4.33 | 4.57.6/5.2.0 | |
7474
| modelscope | >=1.23 | | |
7575
| peft | >=0.11,<0.19 | | LoRA |
76-
| trl | >=0.15,<0.29 | | RLHF |
76+
| trl | >=0.15,<0.30 | | RLHF |
7777

7878

7979
## Quick Start Example

requirements/framework.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,6 @@ tiktoken
3535
tqdm
3636
transformers>=4.33,<5.4.0
3737
transformers_stream_generator
38-
trl>=0.15,<0.29
38+
trl>=0.15,<0.30
3939
uvicorn
4040
zstandard

swift/megatron/trainers/dpo_trainer.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,11 @@
1313

1414

1515
class DummyDPOTrainer(DPOTrainer):
16-
# For reusing the dpo_loss function in TRL.
16+
# For reusing the dpo_loss function implemented in Swift's DPOTrainer.
1717
def __init__(self, args):
18-
from trl.trainer import FDivergenceConstants
1918
self.accelerator = namedtuple('Accelerator', ['device'])(device=get_current_device())
2019
self.f_alpha_divergence_coef = 1.
21-
self.f_divergence_params = {FDivergenceConstants.ALPHA_DIVERGENCE_COEF_KEY: self.f_alpha_divergence_coef}
20+
self.f_divergence_params = {'alpha_divergence_coef': self.f_alpha_divergence_coef}
2221
self.reference_free = args.reference_free
2322
self.label_smoothing = args.label_smoothing
2423
self.f_divergence_type = args.f_divergence_type

swift/rlhf_trainers/arguments.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@
2828
@dataclass
2929
class DPOConfig(TrainArgumentsMixin, HfDPOConfig):
3030
ld_alpha: Optional[float] = None # compat trl==0.15
31+
# Fields removed in trl 0.29, kept here for backward compatibility
32+
rpo_alpha: Optional[float] = None
33+
ref_adapter_name: Optional[str] = None
34+
reference_free: Optional[bool] = None
3135

3236
def __post_init__(self):
3337
TrainArgumentsMixin.__post_init__(self)

0 commit comments

Comments
 (0)