Skip to content

SFT loss #2

@lujiarui-iie

Description

@lujiarui-iie

Hello, this is an excellent piece of work. I am using four 24GB 4090 GPUs to perform SFT on qwen1.5-0.5B, and I noticed that the loss remains at 0.000000. Is this reasonable?

{'loss': '0.000000', 'learning_rate': 1.9998178635346865e-05, 'epoch': 0.01}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9992715204861295e-05, 'epoch': 0.01}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9983611698723126e-05, 'epoch': 0.02}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9970871433093214e-05, 'epoch': 0.02}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9954499048905464e-05, 'epoch': 0.03}                                                                                                       
{'loss': '0.000000', 'learning_rate': 1.9934500510176242e-05, 'epoch': 0.04}

Additionally, could you help clarify whether it is reasonable that 4x24GB of GPU memory is not enough to fine-tune a 3B model with LoRA? (even batchsize=1)

Looking forward to your reply, and thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions