Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Reduce inconsistency across trainer test files
#5678 opened Apr 29, 2026 by qgallouedec Member Loading…
Fix discarded assertion message in trainer parameter checks
#5677 opened Apr 29, 2026 by qgallouedec Member Loading…
8 tasks
Enable chunked NLL loss with PEFT in SFT
#5676 opened Apr 28, 2026 by qgallouedec Member Loading…
Add {% generation %} markers for Cohere2 chat template
#5675 opened Apr 28, 2026 by qgallouedec Member Loading…
Simplify peft_config handling in experimental trainers
#5674 opened Apr 28, 2026 by albertvillanova Member Loading…
Simplify peft_config handling in core trainers
#5673 opened Apr 28, 2026 by albertvillanova Member Loading…
feat(vllm-serve): add --reasoning-parser and --reasoning-config flags
#5672 opened Apr 28, 2026 by kfirah-create Loading…
2 of 8 tasks
Fix token_type_ids requirement for gemma-3 models in GRPOTrainer
#5644 opened Apr 26, 2026 by robrui Loading…
3 of 6 tasks
DeepSeek v4
#5641 opened Apr 25, 2026 by qgallouedec Member Draft
8 tasks
Fix spurious KL gradients for zero-std reward groups in GRPOTrainer
#5640 opened Apr 24, 2026 by robrui Loading…
3 of 6 tasks
Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config
#5638 opened Apr 24, 2026 by qgallouedec Member Loading…
8 tasks
Refactor tiny-model generation scripts
#5637 opened Apr 24, 2026 by qgallouedec Member Loading…
Upload testing suite for DistillationTrainer
#5615 opened Apr 21, 2026 by cmpatino Collaborator Loading…
3 of 8 tasks
Add LoRA support for AsyncGRPO
#5610 opened Apr 21, 2026 by jonahsamost Loading…
2 of 8 tasks
experimental: Self-Distillation Zero
#5609 opened Apr 20, 2026 by LeonEricsson Collaborator Loading…
1 of 8 tasks
support prefetch/prefetch_depth for async GRPO for ~5% speedups
#5602 opened Apr 20, 2026 by winglian Contributor Loading…
1 of 8 tasks
Fix nested vocab_size for DistillationTrainer and GOLDTrainer
#5592 opened Apr 19, 2026 by Beichen-Ma Loading…
2 of 8 tasks
feat: add TargetPO trainer
#5591 opened Apr 18, 2026 by JeanKaddour Draft
4 of 8 tasks
Add training chat template for Qwen3-2507
#5574 opened Apr 16, 2026 by SwayamInSync Contributor Loading…
refactor: self distillation trainers (sdpo/sdft/...)
#5573 opened Apr 16, 2026 by LeonEricsson Collaborator Loading…
2 of 8 tasks
Improve BrowserGym examples for latest OpenEnv version
#5568 opened Apr 16, 2026 by sergiopaniego Member Loading…
8 tasks
Revert VLM support in parse_response
#5561 opened Apr 15, 2026 by qgallouedec Member Draft
ProTip! Exclude everything labeled bug with -label:bug.