Skip to content

Trainer.push_to_hub() with PEFT doesn't work when the base model is loaded from local disk #33922

Closed
@valayDave

Description

@valayDave

System Info

  • transformers version: 4.44.2
  • Platform: Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
  • Python version: 3.12.0
  • Huggingface_hub version: 0.25.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: False

Who can help?

@muellerzr @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

model_path = "path/to/my/model/on/disk"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map=device_map,
    use_auth_token=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# some code to create PeftConfig 
...

# You can even use a regular trainer instead 
trainer = SFTTrainer(
        model=model,
        args=training_arguments,
        tokenizer=tokenizer,
        train_dataset=train_dataset,
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=args.max_seq_length,
        packing=args.packing,
        callbacks=callbacks,
    )

trainer.train()
trainer.push_to_hub()

I have a hub_model_id present in the TrainingArgs when I run my scripts. If I set push_to_hub=True in TrainingArgs, it ends up throwing a ValueError like

 File "/home/ob-workspace/metaflow-checkpoint-examples/nim_lora/finetune_hf_peft.py", line 127, in sft
    trainer.push_to_hub(
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 481, in push_to_hub
    return super().push_to_hub(commit_message=commit_message, blocking=blocking, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/transformers/trainer.py", line 4353, in push_to_hub
    return upload_folder(
           ^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 1485, in _inner
    return fn(self, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 4972, in upload_folder
    add_operations = self._prepare_upload_folder_additions(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9478, in _prepare_upload_folder_additions
    self._validate_yaml(
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9542, in _validate_yaml
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "base_model" with value "/tmp/metaflow_models_model_reference_c5stocp_" is not valid. Use a model id from https://hf.co/models.

While reading the code I also noticed that if we passed a finetuned_from argument to the push_to_hub function, the trainer passes them down to the create_model_card function, but that function ends up letting Peft change the generated card with a new one. The problem here is that PeftModel's create_or_update_model_card is not accounting for the value of base_model set in the card and because of that the card is invalid and Huggingface doesn't allow pushing the model to the hub.

I have a fix in the PeFT library to fix this : huggingface/peft#2124

Expected behavior

Overall the expected behavior for me when the model being loaded from disk and pushed to hub is :

If a model is loaded from local disk and then trained with Peft (or any other HF extensions/trainer), push to hub should work. For accommodating this the library should :

  1. Distinguish between names and paths when setting certain information such as base_model.
  2. Allow passing information like base_model in the TrainingArguments so that push_to_hub=True can work with the trainer. Currently push_to_hub=True wont work in the PEFT scenarios because the README.md created by the PeftModel's create_or_update_model_card Overrides the value with something which can be a name or a path. And if it is a path (like this issue), it will just crash!
  3. Ensure that finetuned_from can be passed down explicitly to extensions of the library like PEFT or does something like [bug fix] ensure base_model is correctly set in model card peft#2124 (don't write a base_model value in the card if there is a base_model set in it)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions