Problem with deepspeed finetuning

I've tried to train Mistral-7b-v0.1 on multiple GPU-s using deepspeed. 
I started with example from ReadMe - 
```python
from xllm import Config
from xllm.datasets import GeneralDataset
from xllm.cli import cli_run_train

import deepspeed

print(deepspeed.__file__, deepspeed.__version__)

if __name__ == '__main__':
    train_data = ["Hello!"] * 100
    train_dataset = GeneralDataset.from_list(data=train_data)
    cli_run_train(config_cls=Config, train_dataset=train_dataset)
```
And start it using:
```
deepspeed --num_gpus=4 main.py --deepspeed_stage 2   --apply_lora True
```
But it it did not start:
https://gist.github.com/freQuensy23-coder/3a2341d4642b19b07fd533ac62fbb6cb

Enviromental params
CUDA 11.7
torch==2.0.1
deepspeed==0.13.1
packaging
xllm





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with deepspeed finetuning #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem with deepspeed finetuning #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions