Hey @justinchuby,
First off, thanks for the amazing project! I've been looking for something like this for a while.
I had a couple of questions about the fundamental difference between the convert and embed operations for this library. I have a model trained in-house (which I can't share unfortunately), that seems to be successful when running a conversion but fails when running a embed.
I am also trying to understand what method would be helpful if I wanted to assess the memory usage of the model based on the converted safetensors file (if you know of a way to do this directly through onnx I would be very much interested to hear this). For example, would I able to use the hf-mem tool to accurately calculate VRAM usage of the onnx model after conversion?
~/dev/inferentia main ?30 > uv run onnx-safetensors convert model.onnx model.safetensors 10s py 3.12 14:30:02
Loading ONNX model from model.onnx...
Converting model to safetensors format...
Saving model.safetensors (onnx::MatMul_12379): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 380/380 [00:00<00:00, 3608.57it/s]
Model saved to model.safetensors
~/dev/inferentia main ?31 > uv run onnx-safetensors embed model.onnx model.safetensors py 3.12 14:39:53
Loading ONNX model from model.onnx...
Embedding model into safetensors file...
Traceback (most recent call last):
File "/Users/vidamoda/dev/inferentia/.venv/bin/onnx-safetensors", line 10, in <module>
sys.exit(main())
^^^^^^
File "/Users/vidamoda/dev/inferentia/.venv/lib/python3.12/site-packages/onnx_safetensors/_cli.py", line 96, in main
embed_command(args)
File "/Users/vidamoda/dev/inferentia/.venv/lib/python3.12/site-packages/onnx_safetensors/_cli.py", line 53, in embed_command
onnx_safetensors.save_safetensors_model(model, output_path)
File "/Users/vidamoda/dev/inferentia/.venv/lib/python3.12/site-packages/onnx_safetensors/_safetensors_io.py", line 640, in save_safetensors_model
safetensors.serialize_file(tensor_dict, safetensors_model_path, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: header too large
Thanks again for the project!
Hey @justinchuby,
First off, thanks for the amazing project! I've been looking for something like this for a while.
I had a couple of questions about the fundamental difference between the
convertandembedoperations for this library. I have a model trained in-house (which I can't share unfortunately), that seems to be successful when running a conversion but fails when running aembed.I am also trying to understand what method would be helpful if I wanted to assess the memory usage of the model based on the converted safetensors file (if you know of a way to do this directly through onnx I would be very much interested to hear this). For example, would I able to use the hf-mem tool to accurately calculate VRAM usage of the onnx model after conversion?
Thanks again for the project!