Skip to content

Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tflsxyy
Copy link

@tflsxyy tflsxyy commented Mar 6, 2025

Issue:
When using HuggingFace Transformers versions earlier than v4.47.1, loading the DeepSeek-V3 bf16 weights generated by fp8_cast_bf16.py raises the following error:

Loading checkpoint shards:   0%|                                                                                                                                                                                                                                             | 0/163 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/dataDisk/DeepSeek-V3-bf16/deepseek-v3-load.py", line 3, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4706, in _load_pretrained_model
    state_dict = load_state_dict(
                 ^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 557, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

How to reproduce:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "/root/dataDisk/DeepSeek-V3-bf16",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="cpu",
)
torch == 2.5.1
triton == 3.1.0
transformers == 4.46.3
safetensors == 0.4.5

This issue occurs because the transformers library expects a specific metadata entry {"format": "pt"} in each safetensors file.

Proposed solution:
Similar to transformers https://github.com/huggingface/transformers/blob/6966fa190172b48b2fb46fe4552a13b943e692cf/src/transformers/modeling_utils.py#L3132 I changed line 88 of fp8_cast_bf16.py to:

        save_file(new_state_dict, new_safetensor_file, metadata={"format": "pt"})

Tested locally, and the model successfully loads after applying the fix. Example uploaded weights: https://huggingface.co/tflsxyy/DeepSeek-V3-bf16

Even though not many people will load bf16 weight of DeepSeek-V3/R1, and this error is fixed by transformers library after v4.48.0 https://github.com/huggingface/transformers/blob/6bc0fbcfa7acb6ac4937e7456a76c2f7975fefec/src/transformers/modeling_utils.py#L506 but I think there are still some people testing with the bf16 weights and using transformers library before v4.47.1, this code needs to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant