Adding GRPO training #8

Goekdeniz-Guelmez · 2025-03-14T08:36:29Z

No description provided.

… and NaNs. Added warnings for cases where all reward functions return None, and updated metrics calculation to account for valid rewards only.

…just grpo_loss function to handle expanded types. This enhances the flexibility of reward calculations.

…les, enhancing flexibility in data handling.</message> <message>Aktualisieren Sie iterate_grpo_batches, um die optionale 'type'-Information in Datensatz-Tupeln zu unterstützen und die Flexibilität bei der Datenverarbeitung zu verbessern.

…, improving data handling capabilities.

… print statements and clean up code formatting. Added 'average_generated_tokens' metric to improve tracking of generation performance.

Goekdeniz-Guelmez added 6 commits March 14, 2025 09:30

update lora.py

aa8f0cb

udpate datasets.py + nits

e73edb8

adding grpo_trainer + grpo_reward_functions.py + nits

4152036

update lora_config.yaml

6c2f5df

update LORA.md

a6e07e1

udpate acknowledgements.md

4e71c27

awni mentioned this pull request Mar 14, 2025

Adding grpo training ml-explore/mlx-examples#1233

Closed

Goekdeniz-Guelmez and others added 23 commits March 17, 2025 09:36

Merge branch 'ml-explore:main' into adding-grpo-training

1092d24

Merge branch 'ml-explore:main' into adding-grpo-training

3c37059

Merge branch 'main' into adding-grpo-training

bb74407

Reseting metrics after trainign step

8d30bc2

remove custom generate_steps function and use build in one

ef87082

code formatting

17c6022

Merge branch 'main' into adding-grpo-training

0cdea69

making the key fields customizable

4c02c1a

prevent twice prompt masking in grpo mode

9b854c5

removed sampler function

3c92a62

fix dataset

5a87cfb

reduce temperature

52f8559

nits

2755d8a

Enhance reward processing in grpo_loss function to handle None values…

1ebe5f7

… and NaNs. Added warnings for cases where all reward functions return None, and updated metrics calculation to account for valid rewards only.

Update reward function signatures to include 'types' parameter and ad…

d8251c0

…just grpo_loss function to handle expanded types. This enhances the flexibility of reward calculations.

Update train_grpo function to include 'type_info' in batch processing…

176764e

…, improving data handling capabilities.

fix

cba01a3

update datasets.py

1a67665

Refactor grpo_reward_functions and grpo_trainer to remove unnecessary…

6260d6b

… print statements and clean up code formatting. Added 'average_generated_tokens' metric to improve tracking of generation performance.

Merge branch 'ml-explore:main' into adding-grpo-training

3572b69

Merge branch 'ml-explore:main' into adding-grpo-training

79e0f63

remove metal in clear cache

a698122

Goekdeniz-Guelmez and others added 9 commits March 25, 2025 09:17

nits

86d7561

update generate_step importing sinze its moved to generate.py

5744d1c

nits

efe29aa

removing metal in mx.get_peak_memory

92f7b9e

Merge branch 'ml-explore:main' into adding-grpo-training

1dc2c6c

Merge branch 'main' into adding-grpo-training

ea42938

nits

ead3270

Merge branch 'main' into adding-grpo-training

4c173eb

Merge branch 'ml-explore:main' into adding-grpo-training

58db849

tmc mentioned this pull request Apr 10, 2025

Add GRPO support to mlx_lm.lora #77

Closed

Goekdeniz-Guelmez and others added 3 commits April 12, 2025 13:19

nits

320b8f7

Merge branch 'ml-explore:main' into adding-grpo-training

7de557f

Merge branch 'main' into adding-grpo-training

d6ab7fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GRPO training #8

Adding GRPO training #8

Goekdeniz-Guelmez commented Mar 14, 2025

Adding GRPO training #8

Are you sure you want to change the base?

Adding GRPO training #8

Conversation

Goekdeniz-Guelmez commented Mar 14, 2025