Skip to content

adding ORPO training #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Goekdeniz-Guelmez
Copy link
Contributor

No description provided.

@Goekdeniz-Guelmez Goekdeniz-Guelmez changed the title adding DPO training adding ORPO training Mar 14, 2025
@Goekdeniz-Guelmez
Copy link
Contributor Author

@awni @ivanfioravanti would you mind test that training, for yourself? looks like its working correctly for me.

This is what I use on this run:

python -m mlx_lm.lora \
    --model mlx-community/OLMoE-1B-7B-0125-Instruct-4bit \
    --train \
    --test \
    --num-layers 8 \
    --data mlx-community/orpo-dpo-mix-40k-mlx \
    --iters 500 \
    --batch-size 1 \
    --val-batches 1 \
    --steps-per-report 10 \
    --adapter-path /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/OLMoE-orpo \
    --max-seq-length 1024 \
    --grad-checkpoint \
    --training-mode orpo \
    --fine-tune-type lora \
    --beta 0.1 \
    --steps-per-eval 50 \
    --test-batches 1

@Goekdeniz-Guelmez
Copy link
Contributor Author

Trainable parameters: 0.010% (0.659M/6919.162M)
Starting ORPO training..., iters: 500
Iter 1: Val loss 0.055, Val chosen reward -0.240, Val rejected reward -0.268, Val accuracy 1.000, Val margin 0.028, Val took 0.645s
Iter 10: Train loss 0.061, Chosen reward -0.206, Rejected reward -0.227, Accuracy 0.600, Margin 0.021, Learning Rate 1.000e-05, It/sec 0.103, Tokens/sec 63.892, Peak mem 5.276 GB
Iter 20: Train loss 0.058, Chosen reward -0.176, Rejected reward -0.203, Accuracy 0.800, Margin 0.027, Learning Rate 1.000e-05, It/sec 0.068, Tokens/sec 63.266, Peak mem 6.435 GB
Iter 30: Train loss 0.072, Chosen reward -0.219, Rejected reward -0.236, Accuracy 0.500, Margin 0.016, Learning Rate 1.000e-05, It/sec 0.093, Tokens/sec 62.543, Peak mem 6.435 GB
Iter 40: Train loss 0.057, Chosen reward -0.168, Rejected reward -0.195, Accuracy 0.700, Margin 0.028, Learning Rate 1.000e-05, It/sec 0.059, Tokens/sec 62.565, Peak mem 7.027 GB
Iter 50: Val loss 0.052, Val chosen reward -0.250, Val rejected reward -0.285, Val accuracy 1.000, Val margin 0.036, Val took 0.653s
Iter 50: Train loss 0.045, Chosen reward -0.169, Rejected reward -0.240, Accuracy 0.800, Margin 0.071, Learning Rate 1.000e-05, It/sec 0.678, Tokens/sec 685.863, Peak mem 7.767 GB
Iter 60: Train loss 0.054, Chosen reward -0.181, Rejected reward -0.216, Accuracy 0.800, Margin 0.035, Learning Rate 1.000e-05, It/sec 0.084, Tokens/sec 62.568, Peak mem 7.767 GB
Iter 70: Train loss 0.057, Chosen reward -0.185, Rejected reward -0.216, Accuracy 0.700, Margin 0.031, Learning Rate 1.000e-05, It/sec 0.076, Tokens/sec 65.463, Peak mem 7.767 GB
Iter 80: Train loss 0.069, Chosen reward -0.205, Rejected reward -0.212, Accuracy 0.600, Margin 0.008, Learning Rate 1.000e-05, It/sec 0.064, Tokens/sec 64.043, Peak mem 7.767 GB
Iter 90: Train loss 0.062, Chosen reward -0.189, Rejected reward -0.211, Accuracy 0.700, Margin 0.022, Learning Rate 1.000e-05, It/sec 0.067, Tokens/sec 62.760, Peak mem 7.767 GB
Iter 100: Val loss 0.048, Val chosen reward -0.283, Val rejected reward -0.328, Val accuracy 1.000, Val margin 0.045, Val took 0.746s
Iter 100: Train loss 0.063, Chosen reward -0.170, Rejected reward -0.191, Accuracy 0.500, Margin 0.021, Learning Rate 1.000e-05, It/sec 1.194, Tokens/sec 1157.902, Peak mem 7.767 GB
Iter 100: Saved adapter weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/adapters.safetensors and /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/0000100_adapters.safetensors.
Iter 110: Train loss 0.065, Chosen reward -0.215, Rejected reward -0.230, Accuracy 0.500, Margin 0.015, Learning Rate 1.000e-05, It/sec 0.060, Tokens/sec 59.129, Peak mem 7.767 GB
Iter 120: Train loss 0.055, Chosen reward -0.176, Rejected reward -0.233, Accuracy 0.600, Margin 0.057, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 58.178, Peak mem 7.767 GB
Iter 130: Train loss 0.045, Chosen reward -0.208, Rejected reward -0.264, Accuracy 0.800, Margin 0.056, Learning Rate 1.000e-05, It/sec 0.068, Tokens/sec 60.683, Peak mem 7.767 GB
Iter 140: Train loss 0.063, Chosen reward -0.237, Rejected reward -0.264, Accuracy 0.700, Margin 0.027, Learning Rate 1.000e-05, It/sec 0.059, Tokens/sec 62.057, Peak mem 7.767 GB
Iter 150: Val loss 0.045, Val chosen reward -0.335, Val rejected reward -0.389, Val accuracy 1.000, Val margin 0.054, Val took 0.671s
Iter 150: Train loss 0.069, Chosen reward -0.227, Rejected reward -0.245, Accuracy 0.500, Margin 0.018, Learning Rate 1.000e-05, It/sec 0.777, Tokens/sec 797.980, Peak mem 7.767 GB
Iter 160: Train loss 0.054, Chosen reward -0.190, Rejected reward -0.227, Accuracy 0.600, Margin 0.038, Learning Rate 1.000e-05, It/sec 0.071, Tokens/sec 63.211, Peak mem 7.767 GB
Iter 170: Train loss 0.053, Chosen reward -0.204, Rejected reward -0.256, Accuracy 0.600, Margin 0.052, Learning Rate 1.000e-05, It/sec 0.079, Tokens/sec 64.376, Peak mem 7.767 GB
Iter 180: Train loss 0.058, Chosen reward -0.236, Rejected reward -0.275, Accuracy 0.700, Margin 0.039, Learning Rate 1.000e-05, It/sec 0.069, Tokens/sec 61.759, Peak mem 7.767 GB
Iter 190: Train loss 0.035, Chosen reward -0.221, Rejected reward -0.326, Accuracy 0.900, Margin 0.105, Learning Rate 1.000e-05, It/sec 0.075, Tokens/sec 59.210, Peak mem 7.767 GB
Iter 200: Val loss 0.043, Val chosen reward -0.367, Val rejected reward -0.427, Val accuracy 1.000, Val margin 0.060, Val took 0.652s
Iter 200: Train loss 0.055, Chosen reward -0.219, Rejected reward -0.265, Accuracy 0.600, Margin 0.046, Learning Rate 1.000e-05, It/sec 0.432, Tokens/sec 341.032, Peak mem 7.767 GB
Iter 200: Saved adapter weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/adapters.safetensors and /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/0000200_adapters.safetensors.
Iter 210: Train loss 0.061, Chosen reward -0.212, Rejected reward -0.246, Accuracy 0.700, Margin 0.034, Learning Rate 1.000e-05, It/sec 0.062, Tokens/sec 60.745, Peak mem 7.767 GB
Iter 220: Train loss 0.053, Chosen reward -0.189, Rejected reward -0.246, Accuracy 0.800, Margin 0.058, Learning Rate 1.000e-05, It/sec 0.061, Tokens/sec 57.557, Peak mem 7.767 GB
Iter 230: Train loss 0.051, Chosen reward -0.187, Rejected reward -0.252, Accuracy 0.800, Margin 0.065, Learning Rate 1.000e-05, It/sec 0.086, Tokens/sec 62.054, Peak mem 7.767 GB
Iter 240: Train loss 0.047, Chosen reward -0.213, Rejected reward -0.276, Accuracy 0.700, Margin 0.063, Learning Rate 1.000e-05, It/sec 0.081, Tokens/sec 59.913, Peak mem 7.767 GB
Iter 250: Val loss 0.042, Val chosen reward -0.404, Val rejected reward -0.467, Val accuracy 1.000, Val margin 0.063, Val took 0.680s
Iter 250: Train loss 0.045, Chosen reward -0.234, Rejected reward -0.292, Accuracy 1.000, Margin 0.059, Learning Rate 1.000e-05, It/sec 0.495, Tokens/sec 388.247, Peak mem 7.767 GB
Iter 260: Train loss 0.051, Chosen reward -0.210, Rejected reward -0.262, Accuracy 0.800, Margin 0.052, Learning Rate 1.000e-05, It/sec 0.057, Tokens/sec 64.152, Peak mem 7.767 GB
Iter 270: Train loss 0.043, Chosen reward -0.244, Rejected reward -0.337, Accuracy 0.800, Margin 0.093, Learning Rate 1.000e-05, It/sec 0.090, Tokens/sec 59.819, Peak mem 7.767 GB
Iter 280: Train loss 0.057, Chosen reward -0.250, Rejected reward -0.323, Accuracy 0.700, Margin 0.074, Learning Rate 1.000e-05, It/sec 0.071, Tokens/sec 60.443, Peak mem 7.767 GB
Iter 290: Train loss 0.053, Chosen reward -0.217, Rejected reward -0.264, Accuracy 0.700, Margin 0.047, Learning Rate 1.000e-05, It/sec 0.065, Tokens/sec 57.679, Peak mem 7.767 GB
Iter 300: Val loss 0.041, Val chosen reward -0.449, Val rejected reward -0.517, Val accuracy 1.000, Val margin 0.068, Val took 0.660s
Iter 300: Train loss 0.057, Chosen reward -0.204, Rejected reward -0.266, Accuracy 0.500, Margin 0.062, Learning Rate 1.000e-05, It/sec 0.619, Tokens/sec 849.234, Peak mem 8.767 GB
Iter 300: Saved adapter weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/adapters.safetensors and /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/OLMoE-orpo/0000300_adapters.safetensors.

@Goekdeniz-Guelmez
Copy link
Contributor Author

python -m mlx_lm.generate --model mlx-community/OLMoE-1B-7B-0125-Instruct-4bit \
    --prompt "what's up"  \
    --max-tokens 1024 \
    --adapter-path /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/OLMoE-orpo
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 36209.82it/s]
==========
Hello! I'm here to help answer your questions, provide information, or engage in a conversation. Whether you're looking for insights on a wide range of topics, need advice, have questions about technology, science, history, or anything else, feel free to ask. What's on your mind today? Let's chat!
==========
Prompt: 16 tokens, 167.715 tokens-per-sec
Generation: 66 tokens, 115.174 tokens-per-sec
Peak memory: 4.186 GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant