Description
Describe the feature request
Right nowm InferenceSession constructor is able to load models from two sources:
- string (path to model)
- byte[] (the actual model)
When using the constructors using Byte[]
, the previous steps typically involve loading the model from a Stream, and in many cases also involve a MemoryStream that has a .ToArray()
that returns the byte[]
array of the loaded file.
The problem is that the .ToArray()
of a MemoryStream
creates a copy of the loaded file because the internal buffers of memory stream are actually larger.
Certainly it could be possible to load the model straight into a byte[]
array provided you know the file length beforehand, but that's not always possible nor reliable when using certain Streams. So the safe approach to load a stream is to copy it to a MemoryStream and then extract the bytes from it.
MemoryStream has a way to avoid creating a copy, which is using TryGetBuffer();
that returns an ArraySegment<Byte>
, which is what I think the constructors should use instead of Byte[]
.
So my request is to add additional constructors to InferenceSession:
public InferenceSession(ArraySegment<Byte> model);
public InferenceSession(Stream model);
Describe scenario use case
To be able to load models from sources other than file system path or a straight byte[] array, to avoid creating memory copies.
ArraySegment<Byte> modelBytes;
using(var m = new MemoryStream())
{
using(var s = await httpClient.GetStreamAsync("model url"))
{
s.CopyTo(m);
}
m.TryGetBuffer(out modelBytes); // get the buffer without creating a copy
}
session = new InferenceSession(modelBytes);
Indirectly, this will help lower the memory pressure when using large models on devices with little memory.