Practical Guidance and Tutorial on Building Reasoning Capabilities using Distillation and Reinforcement Learning
Hands-on Tutorial
NVIDIA Powered
Cutting-edge
Presented by NVIDIA Deep Learning Solution Architects
Official KDD 2025 Tutorial
With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there's growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs).
This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches:
- Knowledge Distillation: Transferring capabilities from advanced reasoning models
- Reinforcement Learning: Further enhancing capabilities through post-training techniques
Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further.
Target audience: Data scientists and engineers who are interested in enhancing reasoning capabilities in LLMs for downstream tasks.
Skill Level: The tutorial will be designed to accommodate participants with varying levels of expertise, from beginners to moderately skilled users, with the pace set to ensure beginners can follow comfortably.
Prerequisites:
- Basic knowledge of deep learning and LLM concepts
- Familiarity with Python programming
This part covers the core methodologies that can be leveraged to incentivize reasoning capabilities into large language models (LLMs). First, we will explain how knowledge distillation works for transferring reasoning abilities from large models to smaller ones, covering concepts like long Chain-of-Thought data and teacher-student frameworks. Then, we will introduce how reinforcement learning (RL) can be applied to post-train LLMs for enhanced reasoning, including reward design and various RL algorithms.
In this part we will introduce the lab environment, the libraries, frameworks, and datasets used in the exercise. We'll guide the participants to conduct a quick verification process to ensure everyone's environment is correctly configured.
In this lab, participants will learn how to automatically generate long Chain-of-Thought data that encapsulates reasoning processes from advanced reasoning models, such as DeepSeek-R1. We'll demonstrate how to filter low-quality data and prepare it for distillation into smaller models using NeMo's data processing tools.
This lab will walk participants through implementing knowledge distillation in open-source models like Qwen and Llama using NVIDIA's NeMo framework. We'll cover the technical details of setting up distillation training experiment in NeMo, monitoring training progress effectively, and evaluating the results.
This lab will teach participants how to set up a reinforcement learning environment for post-training LLMs to further enhance reasoning capabilities. We'll cover topics including how to use the RL frameworks, how to design reward functions, and guide participants through the whole fine-tuning process using RL.
In conclusion, this tutorial will revisit the two post-training approaches discussed earlier, summarizing their suitable applications, limitations, and unresolved challenges. Participants will also receive a comprehensive set of resources, including online Jupyter Notebook tutorials, curated "awesome lists," and best practices for distillation and reinforcement learning (RL) training.
- Jupyter notebooks
- APIs for accessing various reasoning models (free tiers)
- NeMo framework (open-source)
- Sample datasets for distillation and RL training
- Open-source models (Qwen, Llama)
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Docker (optional)
- 🌐 Tutorial Website: zpqiu.github.io/reasoning-model-tutorial-kdd2025
- 📚 KDD 2025 Official: kdd2025.kdd.org
- 🛠️ NeMo Framework: nvidia.github.io/NeMo
- 📖 Key Papers: DeepSeek R1, Knowledge Distillation in LLMs, RL for LLM Training
By the end of this session, participants will be equipped with:
✅ Theoretical Understanding: Core concepts of knowledge distillation and reinforcement learning for reasoning
✅ Practical Skills: Hands-on experience in data preparation and processing techniques
✅ Technical Implementation: Ability to use NeMo framework for model distillation experiments
✅ Advanced Techniques: Knowledge of applying RL for post-training reasoning enhancement
✅ Real-world Application: Practical experience that can be applied to their own projects
🤝 This tutorial is carefully prepared by NVIDIA Deep Learning Solution Architects
🎓 Presented as an official KDD 2025 Tutorial
💡 Dedicated to advancing AI reasoning technology and applications
