SFT loss

Hello, this is an excellent piece of work. I am using four 24GB 4090 GPUs to perform SFT on qwen1.5-0.5B, and I noticed that the loss remains at 0.000000. Is this reasonable?
```
{'loss': '0.000000', 'learning_rate': 1.9998178635346865e-05, 'epoch': 0.01}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9992715204861295e-05, 'epoch': 0.01}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9983611698723126e-05, 'epoch': 0.02}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9970871433093214e-05, 'epoch': 0.02}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9954499048905464e-05, 'epoch': 0.03}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9934500510176242e-05, 'epoch': 0.04}
```

Additionally, could you help clarify whether it is reasonable that 4x24GB of GPU memory is not enough to fine-tune a 3B model with LoRA? (even batchsize=1)

Looking forward to your reply, and thank you in advance!









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFT loss #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SFT loss #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions