Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

tflsxyy · 2025-03-06T06:45:48Z

Issue:
When using HuggingFace Transformers versions earlier than v4.47.1, loading the DeepSeek-V3 bf16 weights generated by fp8_cast_bf16.py raises the following error:

Loading checkpoint shards:   0%|                                                                                                                                                                                                                                             | 0/163 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/dataDisk/DeepSeek-V3-bf16/deepseek-v3-load.py", line 3, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4706, in _load_pretrained_model
    state_dict = load_state_dict(
                 ^^^^^^^^^^^^^^^^
  File "/root/.local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 557, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

How to reproduce:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "/root/dataDisk/DeepSeek-V3-bf16",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="cpu",
)

torch == 2.5.1
triton == 3.1.0
transformers == 4.46.3
safetensors == 0.4.5

This issue occurs because the transformers library expects a specific metadata entry {"format": "pt"} in each safetensors file.

Proposed solution:
Similar to transformers https://github.com/huggingface/transformers/blob/6966fa190172b48b2fb46fe4552a13b943e692cf/src/transformers/modeling_utils.py#L3132 I changed line 88 of fp8_cast_bf16.py to:

        save_file(new_state_dict, new_safetensor_file, metadata={"format": "pt"})

Tested locally, and the model successfully loads after applying the fix. Example uploaded weights: https://huggingface.co/tflsxyy/DeepSeek-V3-bf16

Even though not many people will load bf16 weight of DeepSeek-V3/R1, and this error is fixed by transformers library after v4.48.0 https://github.com/huggingface/transformers/blob/6bc0fbcfa7acb6ac4937e7456a76c2f7975fefec/src/transformers/modeling_utils.py#L506 but I think there are still some people testing with the bf16 weights and using transformers library before v4.47.1, this code needs to change.

root and others added 2 commits March 6, 2025 14:25

Fix: add metadata to bf16 safetensors for loading using transformers

be411d6

Merge branch 'deepseek-ai:main' into fix-bf16-weight-metadata

84e7789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

Uh oh!

tflsxyy commented Mar 6, 2025

Uh oh!

Uh oh!

Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

Are you sure you want to change the base?

Fix: Add metadata to bf16 safetensors for compatibility with transformers #749

Uh oh!

Conversation

tflsxyy commented Mar 6, 2025

Uh oh!

Uh oh!