QLoRA Inference

Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?

```python
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_id,  
    #config=config,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto'
)

model = PeftModel.from_pretrained(model, "qlora_finetune_folder/")
```

I have changed the Checkpointer to FullModelHFCheckpointer.
Essentially, it is loadable & runnable, but I am curious if it reflects the same structure as qlora_llama3_8b. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA Inference #1020

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development