Llama-3.3-70B-Instruct-4bit LoRA Fine-Tuning: No Change (or Instability) - Adapter Issue? #1147
-
Hi everyone, The core problem is that the LoRA adapter seems to be having no usable effect on the model's output, despite successful training (loss decreases normally). It's not a matter of fine-tuning the scale - it's like the adaptation either does nothing or breaks the model. Here's what I've tried:
I'm really stuck here, and any insights or suggestions would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I tried training this:
And then evaluating it like this:
And it generated the following which is very reasonable:
So I'm not sure where things are going wrong for you. A few suggestions:
|
Beta Was this translation helpful? Give feedback.
-
Hi @awni, sorry to bother you again, but I've tried running LoRA fine-tuning multiple times, but I'm not getting good results. I'm on MLX version 0.23.2 and have tested different learning rates, layer counts, and dataset sizes. The training loss is not improving as expected. In my previous runs, the loss steadily decreased over time, but now it remains relatively high even after multiple iterations. The validation loss also does not show significant improvement, making it unclear if the model is learning effectively. I also noticed that the number of trainable parameters has dropped compared to my previous runs. I decided to try the example you gave me that worked before, but now the same setup isn't improving the model like it did before. The loss is not getting better either. I tried this: mlx_lm.lora --model mlx-community/Llama-3.3-70B-Instruct-4bit --data mlx-community/wikisql --iters 100 --batch-size 1 --num-layers 8 --train
Loading pretrained model
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 18151.12it/s]
Loading datasets
Loading Hugging Face dataset mlx-community/wikisql.
Training
Trainable parameters: 0.002% (1.638M/70553.706M)
Starting training..., iters: 100
Iter 1: Val loss 3.105, Val took 31.339s
Iter 10: Train loss 2.810, Learning Rate 1.000e-05, It/sec 0.594, Tokens/sec 51.949, Trained Tokens 874, Peak mem 40.752 GB
Iter 20: Train loss 2.847, Learning Rate 1.000e-05, It/sec 0.627, Tokens/sec 46.363, Trained Tokens 1613, Peak mem 40.752 GB
Iter 30: Train loss 2.693, Learning Rate 1.000e-05, It/sec 0.660, Tokens/sec 47.737, Trained Tokens 2336, Peak mem 40.752 GB
Iter 40: Train loss 2.268, Learning Rate 1.000e-05, It/sec 0.485, Tokens/sec 41.760, Trained Tokens 3197, Peak mem 40.752 GB
Iter 50: Train loss 1.915, Learning Rate 1.000e-05, It/sec 0.275, Tokens/sec 24.056, Trained Tokens 4072, Peak mem 40.938 GB
Iter 60: Train loss 1.709, Learning Rate 1.000e-05, It/sec 0.124, Tokens/sec 10.004, Trained Tokens 4880, Peak mem 40.938 GB
Iter 70: Train loss 1.535, Learning Rate 1.000e-05, It/sec 0.280, Tokens/sec 20.595, Trained Tokens 5616, Peak mem 40.938 GB
Iter 80: Train loss 1.468, Learning Rate 1.000e-05, It/sec 0.319, Tokens/sec 26.164, Trained Tokens 6436, Peak mem 40.938 GB
Iter 90: Train loss 1.614, Learning Rate 1.000e-05, It/sec 0.352, Tokens/sec 29.681, Trained Tokens 7280, Peak mem 40.938 GB
Iter 100: Val loss 1.535, Val took 62.628s
Iter 100: Train loss 1.692, Learning Rate 1.000e-05, It/sec 0.359, Tokens/sec 28.107, Trained Tokens 8063, Peak mem 40.938 GB
Iter 100: Saved adapter weights to adapters/adapters.safetensors and adapters/0000100_adapters.safetensors.
Saved final weights to adapters/adapters.safetensors. Then I ran: mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --adapter-path adapters --max-tokens 50 \
--prompt "table: 1-10015132-16
columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team
Q: What is terrence ross' nationality
A: "
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 28207.94it/s]
==========
According to the table, Terrence Ross' nationality is American.
==========
Prompt: 79 tokens, 74.713 tokens-per-sec
Generation: 14 tokens, 12.678 tokens-per-sec
Peak memory: 39.968 GB Has anything changed in recent MLX updates that could affect fine-tuning, or is there something I should adjust? Thanks!! |
Beta Was this translation helpful? Give feedback.
-
We had a bug where |
Beta Was this translation helpful? Give feedback.
I tried training this:
And then evaluating it like this:
And it generated the following which is very reasonable: