Skip to content

Metadata written in random order #584

@karl3wm

Description

@karl3wm

System Info

  • transformers version: 4.49.0
  • Platform: Linux-6.12.12-cloud-amd64-x86_64-with-glibc2.40
  • Python version: 3.11.2
  • Huggingface_hub version: 0.29.1
  • Safetensors version: 0.5.3
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no

Information

  • The official example scripts
  • My own modified scripts

Reproduction

import safetensors.torch, torch
filename = 'test.safetensors'
metadata = dict([[str(x),str(x)] for x in range(16)])
print('Original metadata:', metadata)

safetensors.torch.save_file({}, filename, metadata=metadata)
metadata_2 = safetensors.safe_open(filename, framework='pt').metadata()
print('Metadata after save:', metadata_2)

safetensors.torch.save_file({}, filename, metadata=metadata)
metadata_3 = safetensors.safe_open(filename, framework='pt').metadata()
print('Metadata after save:', metadata_3)

output:

Original metadata: {'0': '0', '1': '1', '2': '2', '3': '3', '4': '4', '5': '5', '6': '6', '7': '7', '8': '8', '9': '9', '10': '10', '11': '11', '12': '12', '13': '13', '14': '14', '15': '15'}
Metadata after save: {'11': '11', '13': '13', '5': '5', '2': '2', '0': '0', '6': '6', '7': '7', '8': '8', '15': '15', '1': '1', '3': '3', '10': '10', '12': '12', '4': '4', '9': '9', '14': '14'}
Metadata after save: {'6': '6', '2': '2', '3': '3', '12': '12', '7': '7', '8': '8', '0': '0', '14': '14', '15': '15', '1': '1', '10': '10', '4': '4', '13': '13', '5': '5', '11': '11', '9': '9'}

Expected behavior

It could be nice to save files with consistent metadata ordering. This would let one deterministically produce files that are byte-for-byte identical.

Right now this must be done by generating the content manually per the spec, rather than using the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions