🧠 Incentivizing Reasoning in LLMs

NVIDIA Powered

🧠 Incentivizing Reasoning in LLMs

Practical Guidance and Tutorial on Building Reasoning Capabilities using Distillation and Reinforcement Learning
Hands-on Tutorial NVIDIA Powered Cutting-edge

📅 August 3-7, 2025

📍 Toronto, Canada | KDD 2025 Conference

Presented by NVIDIA Deep Learning Solution Architects
Official KDD 2025 Tutorial

🎯 Tutorial Abstract

With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there's growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs).

This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches:

Knowledge Distillation: Transferring capabilities from advanced reasoning models
Reinforcement Learning: Further enhancing capabilities through post-training techniques

Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further.

🎯 Target Audience and Prerequisites

Target audience: Data scientists and engineers who are interested in enhancing reasoning capabilities in LLMs for downstream tasks.

Skill Level: The tutorial will be designed to accommodate participants with varying levels of expertise, from beginners to moderately skilled users, with the pace set to ensure beginners can follow comfortably.

Prerequisites:

Basic knowledge of deep learning and LLM concepts
Familiarity with Python programming

📚 Tutorial Outline

1. 🌟 Introduction and Lab Overview

⏱️ 20 Minutes

This part covers the core methodologies that can be leveraged to incentivize reasoning capabilities into large language models (LLMs). First, we will explain how knowledge distillation works for transferring reasoning abilities from large models to smaller ones, covering concepts like long Chain-of-Thought data and teacher-student frameworks. Then, we will introduce how reinforcement learning (RL) can be applied to post-train LLMs for enhanced reasoning, including reward design and various RL algorithms.

2. 🔧 Hands-on: Setting Up the Environment for the Lab

⏱️ 10 Minutes

In this part we will introduce the lab environment, the libraries, frameworks, and datasets used in the exercise. We'll guide the participants to conduct a quick verification process to ensure everyone's environment is correctly configured.

3. 🔍 Hands-on: Extracting Reasoning Data

⏱️ 30 Minutes

In this lab, participants will learn how to automatically generate long Chain-of-Thought data that encapsulates reasoning processes from advanced reasoning models, such as DeepSeek-R1. We'll demonstrate how to filter low-quality data and prepare it for distillation into smaller models using NeMo's data processing tools.

4. 🧪 Hands-on: Distilling Reasoning Capability into Smaller Models

⏱️ 60 Minutes

This lab will walk participants through implementing knowledge distillation in open-source models like Qwen and Llama using NVIDIA's NeMo framework. We'll cover the technical details of setting up distillation training experiment in NeMo, monitoring training progress effectively, and evaluating the results.

5. 🚀 Hands-on: Post-training using Reinforcement Learning

⏱️ 60 Minutes

This lab will teach participants how to set up a reinforcement learning environment for post-training LLMs to further enhance reasoning capabilities. We'll cover topics including how to use the RL frameworks, how to design reward functions, and guide participants through the whole fine-tuning process using RL.

6. 🎓 Conclusion and Resources

⏱️ 20 Minutes

In conclusion, this tutorial will revisit the two post-training approaches discussed earlier, summarizing their suitable applications, limitations, and unresolved challenges. Participants will also receive a comprehensive set of resources, including online Jupyter Notebook tutorials, curated "awesome lists," and best practices for distillation and reinforcement learning (RL) training.

👨‍🏫 Short Bio of Tutors

Zhaopeng Qiu

NVIDIA Deep Learning Solution Architect

Currently working at NVIDIA as a Deep Learning Solution Architect. He graduated from Peking University in 2018. His research focuses on large language models (LLMs), recommender systems, and natural language processing (NLP). He has authored over 20 research papers published in leading journals and conferences, including KDD, AAAI, WWW, TKDE, NAACL, COLING, and others.

Jingqi Zhang

NVIDIA Solution Architect

Currently a Solution Architect at NVIDIA, specializing in large language models (LLMs). His work focuses on various aspects of LLMs including training methodologies, practical applications, and reasoning models. Prior to joining NVIDIA, Jingqi obtained both his Bachelor's and Master's degrees in Computer Science and Technology from Xi'an Jiaotong University.

Shuang Yu

NVIDIA Solution Architect

A Solution Architect at NVIDIA focusing on LLMs. She holds a Bachelor's Degree in Automation and a Master's Degree in Computer Science from Tsinghua University. Before joining NVIDIA, Shuang worked as a software architect at IBM, where she led the development of an enterprise-level machine learning platform.

🛠️ Resource Requirements

Technical Resources

Jupyter notebooks
APIs for accessing various reasoning models (free tiers)
NeMo framework (open-source)
Sample datasets for distillation and RL training
Open-source models (Qwen, Llama)

Software Environment

Python 3.8+
CUDA-compatible GPU (recommended)
Docker (optional)

🔗 Key References and Links

🌐 Tutorial Website: zpqiu.github.io/reasoning-model-tutorial-kdd2025
📚 KDD 2025 Official: kdd2025.kdd.org
🛠️ NeMo Framework: nvidia.github.io/NeMo
📖 Key Papers: DeepSeek R1, Knowledge Distillation in LLMs, RL for LLM Training

🎯 Learning Outcomes

By the end of this session, participants will be equipped with:

✅ Theoretical Understanding: Core concepts of knowledge distillation and reinforcement learning for reasoning

✅ Practical Skills: Hands-on experience in data preparation and processing techniques

✅ Technical Implementation: Ability to use NeMo framework for model distillation experiments

✅ Advanced Techniques: Knowledge of applying RL for post-training reasoning enhancement

✅ Real-world Application: Practical experience that can be applied to their own projects

🤝 This tutorial is carefully prepared by NVIDIA Deep Learning Solution Architects
🎓 Presented as an official KDD 2025 Tutorial
💡 Dedicated to advancing AI reasoning technology and applications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Incentivizing Reasoning in LLMs

🎯 Tutorial Abstract

🎯 Target Audience and Prerequisites

📚 Tutorial Outline

👨‍🏫 Short Bio of Tutors

🛠️ Resource Requirements

Technical Resources

Software Environment

🔗 Key References and Links

🎯 Learning Outcomes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Incentivizing Reasoning in LLMs

🎯 Tutorial Abstract

🎯 Target Audience and Prerequisites

📚 Tutorial Outline

👨‍🏫 Short Bio of Tutors

🛠️ Resource Requirements

Technical Resources

Software Environment

🔗 Key References and Links

🎯 Learning Outcomes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages