Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer.push_to_hub() with PEFT doesn't work when the base model is loaded from local disk #33922

Open
4 tasks
valayDave opened this issue Oct 3, 2024 · 0 comments
Labels

Comments

@valayDave
Copy link

System Info

  • transformers version: 4.44.2
  • Platform: Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
  • Python version: 3.12.0
  • Huggingface_hub version: 0.25.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: False

Who can help?

@muellerzr @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

model_path = "path/to/my/model/on/disk"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map=device_map,
    use_auth_token=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# some code to create PeftConfig 
...

# You can even use a regular trainer instead 
trainer = SFTTrainer(
        model=model,
        args=training_arguments,
        tokenizer=tokenizer,
        train_dataset=train_dataset,
        peft_config=peft_config,
        dataset_text_field="text",
        max_seq_length=args.max_seq_length,
        packing=args.packing,
        callbacks=callbacks,
    )

trainer.train()
trainer.push_to_hub()

I have a hub_model_id present in the TrainingArgs when I run my scripts. If I set push_to_hub=True in TrainingArgs, it ends up throwing a ValueError like

 File "/home/ob-workspace/metaflow-checkpoint-examples/nim_lora/finetune_hf_peft.py", line 127, in sft
    trainer.push_to_hub(
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 481, in push_to_hub
    return super().push_to_hub(commit_message=commit_message, blocking=blocking, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/transformers/trainer.py", line 4353, in push_to_hub
    return upload_folder(
           ^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 1485, in _inner
    return fn(self, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 4972, in upload_folder
    add_operations = self._prepare_upload_folder_additions(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9478, in _prepare_upload_folder_additions
    self._validate_yaml(
  File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9542, in _validate_yaml
    raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "base_model" with value "/tmp/metaflow_models_model_reference_c5stocp_" is not valid. Use a model id from https://hf.co/models.

While reading the code I also noticed that if we passed a finetuned_from argument to the push_to_hub function, the trainer passes them down to the create_model_card function, but that function ends up letting Peft change the generated card with a new one. The problem here is that PeftModel's create_or_update_model_card is not accounting for the value of base_model set in the card and because of that the card is invalid and Huggingface doesn't allow pushing the model to the hub.

I have a fix in the PeFT library to fix this : huggingface/peft#2124

Expected behavior

Overall the expected behavior for me when the model being loaded from disk and pushed to hub is :

If a model is loaded from local disk and then trained with Peft (or any other HF extensions/trainer), push to hub should work. For accommodating this the library should :

  1. Distinguish between names and paths when setting certain information such as base_model.
  2. Allow passing information like base_model in the TrainingArguments so that push_to_hub=True can work with the trainer. Currently push_to_hub=True wont work in the PEFT scenarios because the README.md created by the PeftModel's create_or_update_model_card Overrides the value with something which can be a name or a path. And if it is a path (like this issue), it will just crash!
  3. Ensure that finetuned_from can be passed down explicitly to extensions of the library like PEFT or does something like [bug fix] ensure base_model is correctly set in model card peft#2124 (don't write a base_model value in the card if there is a base_model set in it)
@valayDave valayDave added the bug label Oct 3, 2024
@valayDave valayDave changed the title Trainer.push_to_hub() with PEFT doesn't work when the base model is present on local Trainer.push_to_hub() with PEFT doesn't work when the base model is loaded from on local disk Oct 3, 2024
@valayDave valayDave changed the title Trainer.push_to_hub() with PEFT doesn't work when the base model is loaded from on local disk Trainer.push_to_hub() with PEFT doesn't work when the base model is loaded from local disk Oct 3, 2024
valayDave added a commit to outerbounds/metaflow-checkpoint-examples that referenced this issue Oct 3, 2024
* [hf example] support push to hub
* Got push to hub working with Peft.
* Had to work-around this way because of huggingface/transformers#33922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant