Skip to content

Very slow training speed with CURLoRA on Llama 3.1 8B Instruct #2

@NEWbie0709

Description

@NEWbie0709

I am currently fine-tuning the Llama 3.1 8B Instruct model using CURLoRA adapters on a single RTX 4090 GPU.

Image

Problem:

  • It takes ~170 seconds per step (batch) during training.

  • Estimated time to complete one epoch is over 14 days.

  • Estimated full 5-epoch training would take around 2+ months at current speed.

  • the process crashes halfway through.

Question:

  • Is this extremely slow training expected when fine-tuning Llama 3.1 8B models with CURLoRA on a 4090?

  • Is there anything I can optimize further while still using CURLoRA? (e.g., sequence length, optimizer settings, etc.)

Additional Notes:

  • GPU utilization is high (close to 100%) during training.

  • VRAM usage is around 22.5 GB out of 24 GB (4090 almost fully loaded).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions