Description
System Info
transformers
version: 4.44.2- Platform: Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
- Python version: 3.12.0
- Huggingface_hub version: 0.25.1
- Safetensors version: 0.4.5
- Accelerate version: 0.34.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.1+cu121 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: False
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
model_path = "path/to/my/model/on/disk"
model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=bnb_config,
device_map=device_map,
use_auth_token=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# some code to create PeftConfig
...
# You can even use a regular trainer instead
trainer = SFTTrainer(
model=model,
args=training_arguments,
tokenizer=tokenizer,
train_dataset=train_dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=args.max_seq_length,
packing=args.packing,
callbacks=callbacks,
)
trainer.train()
trainer.push_to_hub()
I have a hub_model_id
present in the TrainingArgs
when I run my scripts. If I set push_to_hub=True
in TrainingArgs, it ends up throwing a ValueError like
File "/home/ob-workspace/metaflow-checkpoint-examples/nim_lora/finetune_hf_peft.py", line 127, in sft
trainer.push_to_hub(
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 481, in push_to_hub
return super().push_to_hub(commit_message=commit_message, blocking=blocking, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/transformers/trainer.py", line 4353, in push_to_hub
return upload_folder(
^^^^^^^^^^^^^^
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(args, kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 1485, in _inner
return fn(self, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 4972, in upload_folder
add_operations = self._prepare_upload_folder_additions(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9478, in _prepare_upload_folder_additions
self._validate_yaml(
File "/home/ob-workspace/micromamba/envs/metaflow/linux-64/51041216a07a03b/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 9542, in _validate_yaml
raise ValueError(f"Invalid metadata in README.md.\n{message}") from e
ValueError: Invalid metadata in README.md.
- "base_model" with value "/tmp/metaflow_models_model_reference_c5stocp_" is not valid. Use a model id from https://hf.co/models.
While reading the code I also noticed that if we passed a finetuned_from
argument to the push_to_hub
function, the trainer passes them down to the create_model_card
function, but that function ends up letting Peft change the generated card with a new one. The problem here is that PeftModel's create_or_update_model_card is not accounting for the value of base_model
set in the card and because of that the card is invalid and Huggingface doesn't allow pushing the model to the hub.
I have a fix in the PeFT library to fix this : huggingface/peft#2124
Expected behavior
Overall the expected behavior for me when the model being loaded from disk and pushed to hub is :
If a model is loaded from local disk and then trained with Peft (or any other HF extensions/trainer), push to hub should work. For accommodating this the library should :
- Distinguish between names and paths when setting certain information such as
base_model
. - Allow passing information like
base_model
in the TrainingArguments so thatpush_to_hub=True
can work with the trainer. Currentlypush_to_hub=True
wont work in the PEFT scenarios because the README.md created by the PeftModel's create_or_update_model_card Overrides the value with something which can be a name or a path. And if it is a path (like this issue), it will just crash! - Ensure that
finetuned_from
can be passed down explicitly to extensions of the library like PEFT or does something like [bug fix] ensurebase_model
is correctly set in model card peft#2124 (don't write abase_model
value in the card if there is abase_model
set in it)