Skip to content

Problem saving torch nn.GRU #657

@hslr4

Description

@hslr4

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.43.4
  • Platform: Linux-6.1.0-37-amd64-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • Huggingface_hub version: 0.27.1
  • Safetensors version: 0.6.2
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.8.0+cu128 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA RTX 4000 Ada Generation

Information

  • The official example scripts
  • My own modified scripts

Reproduction

I found the problem to be reported already here: huggingface/accelerate#3101 (comment)
However their solution safe_serialization=False seems not to work anymore and also does only partially solve the problem.

Basically the following error occurs when I try to save a custom model containing a torch.nn.GRU with model.save_pretrained('test'):

Traceback (most recent call last):
  File "/path/main.py", line 156, in <module>
    model.save_pretrained('./test')
  File "/opt/miniconda3/envs/customlipsync/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 408, in save_pretrained
    self._save_pretrained(save_directory)
  File "/opt/miniconda3/envs/customlipsync/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 755, in _save_pretrained
    save_model_as_safetensor(model_to_save, str(save_directory / constants.SAFETENSORS_SINGLE_FILE))
  File "/opt/miniconda3/envs/customlipsync/lib/python3.10/site-packages/safetensors/torch.py", line 169, in save_model
    to_removes = _remove_duplicate_names(state_dict)
  File "/opt/miniconda3/envs/customlipsync/lib/python3.10/site-packages/safetensors/torch.py", line 113, in _remove_duplicate_names
    raise RuntimeError(
RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'gru.weight_ih_l0'}. None is covering the entire storage.Refusing to save/load the model since you could be storing much more memory than needed. Please refer to https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an issue.

Adding a check if there is only one shared tensor, I found here: https://huggingface.co/spaces/safetensors/convert/blob/main/convert.py#L54 to the torch implementation here (I guess)

seems to solve the problem.

Expected behavior

The model is saved without an error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions