Skip to content

Problem with deepspeed finetuning #17

@freQuensy23-coder

Description

@freQuensy23-coder

I've tried to train Mistral-7b-v0.1 on multiple GPU-s using deepspeed.
I started with example from ReadMe -

from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.cli import cli_run_train

import deepspeed

print(deepspeed.__file__, deepspeed.__version__)

if __name__ == '__main__':
    train_data = ["Hello!"] * 100
    train_dataset = GeneralDataset.from_list(data=train_data)
    cli_run_train(config_cls=Config, train_dataset=train_dataset)

And start it using:

deepspeed --num_gpus=4 main.py --deepspeed_stage 2   --apply_lora True

But it it did not start:
https://gist.github.com/freQuensy23-coder/3a2341d4642b19b07fd533ac62fbb6cb

Enviromental params
CUDA 11.7
torch==2.0.1
deepspeed==0.13.1
packaging
xllm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions