Skip to content

Commit d2a6696

Browse files
authored
Merge pull request #27 from azure-ai-foundry/shjondhale/grpo-link
Add link to GRPO demo
2 parents 9552c1d + 10a10a2 commit d2a6696

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## Training an Instruct model into a Reasoning model on AML using GRPO
2+
3+
GRPO (Group Relative Policy Optimization) is a lightweight, memory-efficient reinforcement learning technique designed to train reasoning models by optimising responses without needing a separate value or critique model—making it faster and more scalable than traditional PPO. It leverages reward functions to guide learning, enabling scalable and stable training of reasoning capabilities. In this demo, we will demonstrate how Azure ML can be used to fine-tune a non-reasoning model into a reasoning model with ease.
4+
5+
Video: [https://www.youtube.com/watch?v=YOm_IQt3YWw](https://www.youtube.com/watch?v=YOm_IQt3YWw)
6+
7+
Code: [https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/grpo](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/grpo)

0 commit comments

Comments
 (0)