Skip to content

use safetensors some problems for ascend npu #661

@levsion

Description

@levsion

System Info

safetensors version : 0.6.0
torch version : 2.6.0
torcn_npu version : 2.6.0
ascend cann version : 8.2.RC1
npu version : 24.1.rc3

Hello, I'm using the document parsing component Docling, which runs on a Huawei Ascend server with the version listed above. Docling reports an error when loading the SafeTensors model, and I can't figure it out. The same code and model file work fine on a CUDA GPU, but not on an Ascend NPU. I'd appreciate any help in resolving this issue. The error message is as follows:

Traceback (most recent call last):
File "/usr/local/python3.11.0/bin/docling", line 8, in
sys.exit(app())
^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/typer/main.py", line 341, in call
raise e
File "/usr/local/python3.11.0/lib/python3.11/site-packages/typer/main.py", line 324, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/click/core.py", line 1442, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/typer/core.py", line 694, in main
return _main(
^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/typer/core.py", line 195, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/typer/main.py", line 699, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/cli/main.py", line 690, in convert
export_documents(
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/cli/main.py", line 192, in export_documents
for conv_res in conv_results:
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/document_converter.py", line 258, in convert_all
for conv_res in conv_res_iter:
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/document_converter.py", line 293, in _convert
for item in map(
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/document_converter.py", line 339, in _process_document
conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/document_converter.py", line 360, in _execute_pipeline
pipeline = self._get_pipeline(in_doc.format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/document_converter.py", line 322, in _get_pipeline
self.initialized_pipelines[cache_key] = pipeline_class(
^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/pipeline/standard_pdf_pipeline.py", line 85, in init
TableStructureModel(
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling/models/table_structure_model.py", line 84, in init
self.tf_predictor = TFPredictor(
^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 131, in init
self._model = self._load_model()
^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 208, in _load_model
missing, unexpected = load_model(model, model_fn, device=self._device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/safetensors/torch.py", line 224, in load_model
to_removes = _remove_duplicate_names(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.0/lib/python3.11/site-packages/safetensors/torch.py", line 115, in _remove_duplicate_names
raise RuntimeError(
RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'_encoder._resnet.0.weight'}. None is covering the entire storage.Refusing to save/load the model since you could be storing much more memory than needed. Please refer to https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an issue.

Information

  • The official example scripts
  • My own modified scripts

Reproduction

def _remove_duplicate_names(
state_dict: Dict[str, torch.Tensor],
*,
preferred_names: Optional[List[str]] = None,
discard_names: Optional[List[str]] = None,
) -> Dict[str, List[str]]:
if preferred_names is None:
preferred_names = []
preferred_names = set(preferred_names)
if discard_names is None:
discard_names = []
discard_names = set(discard_names)

shareds = _find_shared_tensors(state_dict)
to_remove = defaultdict(list)
for shared in shareds:
    complete_names = set(
        [name for name in shared if _is_complete(state_dict[name])]
    )
    if not complete_names:
        raise RuntimeError(
            "Error while trying to find names to remove to save state dict, but found no suitable name to keep"
            f" for saving amongst: {shared}. None is covering the entire storage.Refusing to save/load the model"
            " since you could be storing much more memory than needed. Please refer to"
            " https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an"
            " issue."
        )

Expected behavior

help !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions