Open
Description
Hello,
Using the quantization config provided by torchtune, I am unable to run a quantization of llama-3-70b.
tune run quantize --config configs/custom_quantization_untrained_llama.yaml
with custom_quantization_untrained_llama.yaml
the exact default quantification config pointing toward the safetensors files of llama-3-70b.
Config is :
2024-06-27:14:26:29,993 INFO [_utils.py:33] Running QuantizationRecipe with resolved config:
checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /data/checkpoints/llama-3-70b-instruct-hf/
checkpoint_files:
- model-00001-of-00030.safetensors
- model-00002-of-00030.safetensors
- model-00003-of-00030.safetensors
- model-00004-of-00030.safetensors
- model-00005-of-00030.safetensors
- model-00006-of-00030.safetensors
- model-00007-of-00030.safetensors
- model-00008-of-00030.safetensors
- model-00009-of-00030.safetensors
- model-00010-of-00030.safetensors
- model-00011-of-00030.safetensors
- model-00012-of-00030.safetensors
- model-00013-of-00030.safetensors
- model-00014-of-00030.safetensors
- model-00015-of-00030.safetensors
- model-00016-of-00030.safetensors
- model-00017-of-00030.safetensors
- model-00018-of-00030.safetensors
- model-00019-of-00030.safetensors
- model-00020-of-00030.safetensors
- model-00021-of-00030.safetensors
- model-00022-of-00030.safetensors
- model-00023-of-00030.safetensors
- model-00024-of-00030.safetensors
- model-00025-of-00030.safetensors
- model-00026-of-00030.safetensors
- model-00027-of-00030.safetensors
- model-00028-of-00030.safetensors
- model-00029-of-00030.safetensors
- model-00030-of-00030.safetensors
model_type: LLAMA3
output_dir: /workspaces/Meta-Llama-3-70B-Instruct/
recipe_checkpoint: null
device: cuda
dtype: bf16
model:
_component_: torchtune.models.llama3.llama3_70b
quantizer:
_component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
groupsize: 256
seed: 1234
Error is :
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU
Metadata
Assignees
Labels
No labels
Activity