Horizon RL

All

7 repositories

strands-env
Public
Standardizing environment infrastructure with Strands Agents — step, observe, reward.
Python
•
Apache License 2.0
•3•8•0•0•Updated Feb 11, 2026Feb 11, 2026
strands-sglang
Public
SGLang model provider for Strands Agents for on-policy agentic RL training.
Python
•
Apache License 2.0
•2•26•0•0•Updated Feb 11, 2026Feb 11, 2026
OpenKimi
Public
Reproduce Kimi K1.5/K2 RL algorithm and rollout system
pmd rl rollout kimi rlhf
Python
•
Apache License 2.0
•1•12•0•0•Updated Feb 6, 2026Feb 6, 2026
HeaPA
Public
Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning
Apache License 2.0
•0•2•0•0•Updated Jan 27, 2026Jan 27, 2026
DeepPlanner
Public
Code and dataset for paper: DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping
Python
•
Apache License 2.0
•2•0•0•0•Updated Dec 9, 2025Dec 9, 2025
Think-RM
Public
[NeurIPS 2025] Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Python
•1•16•0•0•Updated Nov 2, 2025Nov 2, 2025
uncertainty-router
Public
[NeurIPS 2025] Ask a Strong LLM Judge when Your Reward Model is Uncertain
Python
•0•6•0•0•Updated Oct 23, 2025Oct 23, 2025