Question about GPU memory usage.

Hi, I tried to finetune a llama7b model with HQQ-LORA using dual GPUs.
I found that during "Loading & Quantizing Model Shards", the peak GPU memory usage acheved 35G. What's the problem?
the run command is:
```
export CUDA_VISIBLE_DEVICES=3,4
python train.py \
--world_size 2 \
--model_name /workspace/model/Llama-2-7b-chat-hf \
--gradient_accumulation_steps 2 \
--batch_size 1 \
--context_length 4096 \
--num_epochs 1 \
--sharding_strategy full_shard \
--precision bf16 \
--train_type hqq_lora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset dummy \
--verbose true  
```
Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about GPU memory usage. #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about GPU memory usage. #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions