Skip to content

Quantization for Llama-70b raises CUDA OOM  #1128

Open
@lulmer

Description

Hello,

Using the quantization config provided by torchtune, I am unable to run a quantization of llama-3-70b.

tune run quantize --config configs/custom_quantization_untrained_llama.yaml 

with custom_quantization_untrained_llama.yaml the exact default quantification config pointing toward the safetensors files of llama-3-70b.

Config is :

2024-06-27:14:26:29,993 INFO     [_utils.py:33] Running QuantizationRecipe with resolved config:

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /data/checkpoints/llama-3-70b-instruct-hf/
  checkpoint_files:
  - model-00001-of-00030.safetensors
  - model-00002-of-00030.safetensors
  - model-00003-of-00030.safetensors
  - model-00004-of-00030.safetensors
  - model-00005-of-00030.safetensors
  - model-00006-of-00030.safetensors
  - model-00007-of-00030.safetensors
  - model-00008-of-00030.safetensors
  - model-00009-of-00030.safetensors
  - model-00010-of-00030.safetensors
  - model-00011-of-00030.safetensors
  - model-00012-of-00030.safetensors
  - model-00013-of-00030.safetensors
  - model-00014-of-00030.safetensors
  - model-00015-of-00030.safetensors
  - model-00016-of-00030.safetensors
  - model-00017-of-00030.safetensors
  - model-00018-of-00030.safetensors
  - model-00019-of-00030.safetensors
  - model-00020-of-00030.safetensors
  - model-00021-of-00030.safetensors
  - model-00022-of-00030.safetensors
  - model-00023-of-00030.safetensors
  - model-00024-of-00030.safetensors
  - model-00025-of-00030.safetensors
  - model-00026-of-00030.safetensors
  - model-00027-of-00030.safetensors
  - model-00028-of-00030.safetensors
  - model-00029-of-00030.safetensors
  - model-00030-of-00030.safetensors
  model_type: LLAMA3
  output_dir: /workspaces/Meta-Llama-3-70B-Instruct/
  recipe_checkpoint: null
device: cuda
dtype: bf16
model:
  _component_: torchtune.models.llama3.llama3_70b
quantizer:
  _component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
  groupsize: 256
seed: 1234

Error is :
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions