Skip to content

Encounter Error: ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB #183

@illumination-k

Description

@illumination-k

Hi, thanks for your great projects!

I tried to build a tensorrt of the embedding model (multilingual-e5-large) with following command onvert_model -m intfloat/multilingual-e5-large --backend tensorrt --task embedding --seq-len 16 512 512 --name intfloat-multilingual-e5-large --device cuda --load-external-data --verbose, but I encountered the following error.

Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 357, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 179, in main
    convert_to_onnx(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/pytorch_utils.py", line 158, in convert_to_onnx
    onnx.save(onnx_model, output_path)
  File "/usr/local/lib/python3.8/dist-packages/onnx/__init__.py", line 203, in save_model
    s = _serialize(proto)
  File "/usr/local/lib/python3.8/dist-packages/onnx/__init__.py", line 71, in _serialize
    result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 2235540927

In transformer-deploy, if proto size is exceeded 2GB, save_as_exceed_data should be true.

save_external_data: bool = to_save.ByteSize() > 2 * 1024**3
filename = Path(model_path).name
onnx.save_model(
proto=to_save,
f=model_path,
save_as_external_data=save_external_data,
all_tensors_to_one_file=True,
location=filename + ".data",
)

According to onnx API docs, we should use onnx.checker.check_model.

import onnx

onnx.checker.check_model("path/to/the/model.onnx")
# onnx.checker.check_model(loaded_onnx_model) will fail if given >2GB model

The other idea is if load_external_data is true, save_as_external_data should be true.

In the onnx code, they setMAXIMUM_PROTOBUF = 2000000000. I do not understand why this error occurred.

https://github.com/onnx/onnx/blob/238f2b9a41b28e6db0086c8a1be655d517c94dd1/onnx/checker.py#L45-L47

In the onnx, they use sys.getsizeof instead of ByteSize. This is a difference between transformer-deploy and onnx.

https://github.com/onnx/onnx/blob/238f2b9a41b28e6db0086c8a1be655d517c94dd1/onnx/checker.py#L175-L178

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions