Skip to content
Change the repository type filter

All

    Repositories list

    • Standardizing environment infrastructure with Strands Agents — step, observe, reward.
      Python
      3800Updated Feb 11, 2026Feb 11, 2026
    • SGLang model provider for Strands Agents for on-policy agentic RL training.
      Python
      22600Updated Feb 11, 2026Feb 11, 2026
    • OpenKimi

      Public
      Reproduce Kimi K1.5/K2 RL algorithm and rollout system
      Python
      11200Updated Feb 6, 2026Feb 6, 2026
    • HeaPA

      Public
      Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning
      0200Updated Jan 27, 2026Jan 27, 2026
    • Code and dataset for paper: DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping
      Python
      2000Updated Dec 9, 2025Dec 9, 2025
    • Think-RM

      Public
      [NeurIPS 2025] Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
      Python
      11600Updated Nov 2, 2025Nov 2, 2025
    • [NeurIPS 2025] Ask a Strong LLM Judge when Your Reward Model is Uncertain
      Python
      0600Updated Oct 23, 2025Oct 23, 2025