Does This Fine-tuning code doesn't work in single A6000 GPU for LLaMA-2-7B with LoRA? 

Hi, i am trying to work with your RLHF code to Fine-tune and Reinforcement learning for LLaMA. But I keep getting CUDA out of Memory error while fine-tuning LLaMA-2-7B model with single A6000 GPU even though i use PEFT LoRA method. 

I applied these changes to get rid of CUDA om but error is still occuring. 

1. smaller batch size (1)
2. smaller max length of token and sequence
3. PEFT 

Can i run LLaMA-2-7B fine-tuning with A6000 GPU? Does anyone have succeed LLaMA fine-tuning with single GPU? I just want to know if I'm doing something wrong or if it's just fundamentally impossible to fine tune this model into a single A6000 GPU. 
And does anyone knows how to get rid of CUDA error in this situation? 
Here is my config.yaml file! 

```actor_config:
  model: "llama-7B"
  model_folder: "./llama/llama-2-7b"
  tokenizer_path: "./llama/tokenizer.model"
  train_dataset_path: "./datasets/actor_training_data.json"
  validation_dataset_path: null
  # froze model embedding during training
  froze_embeddings: True
  # use fairscale layers to build the model instead of vanilla pytorch
  # only for llama
  use_fairscale: True
  # max sequence length for the actor (i.e. prompt + completion) it depends on
  # the model used.
  max_sequence_length: 1024
  # max tokens generated by the actor (completion only)
  max_tokens: 1024
  # minimum number of tokens generated by the actor
  min_tokens: 100
  # additional prompt tokens to be used for template or as safety
  additonal_prompt_tokens: 20
  # temperature for the actor
  temperature: 0.1
  batch_size: 2
  # number iteration after print
  iteration_per_print: 1
  lr: 0.000009
  epochs: 1
  # number of backpropagation after saving the checkpoints
  checkpoint_steps: 5000
  # number of checkpoints to keep while removing the older 
  # (keep memory consumption of checkpoints reasonable)
  n_checkpoints_to_keep: 5
  # here specify the name of the actor checkpoint from which resume 
  # during actor training. If null load the last one.
  checkpoint_name: null
  # deepspeed settings
  deepspeed_enable: True
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False
  # use_peft - the parameters of PEFT can be modified in the peft_config.yaml
  peft_enable: True
  peft_config_path: "./artifacts/config/peft_config.yaml"
  ```

and here is my peft_config file:
```---
inference_mode: False
r: 8
lora_alpha: 32
lora_dropout: 0.1
```

Thank you for reading! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does This Fine-tuning code doesn't work in single A6000 GPU for LLaMA-2-7B with LoRA? #359

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does This Fine-tuning code doesn't work in single A6000 GPU for LLaMA-2-7B with LoRA? #359

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions