Description
Originally posted in dotnet/TorchSharp by @jimquittenton:
The naming scheme for layers are different in the ResNet example model found in this repo and the ResNet models found in TorchVision, which prevents a model saved from Python from being loaded in TorchSharp using this example code.
Original post:
Hi,
I'm new to TorchSharp and am having trouble loading a python trained ResNet18 model. I've been following this article: https://github.com/dotnet/TorchSharp/blob/main/docfx/articles/saveload.md and have exported my python model using the 'save_state_dict' function in this script: https://github.com/dotnet/TorchSharp/blob/main/src/Python/exportsd.py .
In TorchSharp I have copied the ResNet model from https://github.com/dotnet/TorchSharpExamples/blob/main/src/CSharp/Models/ResNet.cs and then call the following:
int numClasses = 3;
ResNet myModel = ResNet.ResNet18(numClasses);
myModel.to(DeviceType.CPU);
myModel.load(mPath);
The load() line throws an exception with message Mismatched module state names: the target modules does not have a submodule or buffer named 'conv1.weight'.
If I examine the state_dict from 'myModel' prior to load(), it contains entries like:
{[layers.conv2d-first.weight, {TorchSharp.Modules.Parameter}]}
{[layers.bnrm2d-first.weight, {TorchSharp.Modules.Parameter}]}
{[layers.bnrm2d-first.bias, {TorchSharp.Modules.Parameter}]}
{[layers.bnrm2d-first.running_mean, {TorchSharp.torch.Tensor}]}
{[layers.bnrm2d-first.running_var, {TorchSharp.torch.Tensor}]}
{[layers.bnrm2d-first.num_batches_tracked, {TorchSharp.torch.Tensor}]}
{[layers.blck-64-0.layers.blck-64-0-conv2d-1.weight, {TorchSharp.Modules.Parameter}]}
{[layers.blck-64-0.layers.blck-64-0-bnrm2d-1.weight, {TorchSharp.Modules.Parameter}]}
{[layers.blck-64-0.layers.blck-64-0-bnrm2d-1.bias, {TorchSharp.Modules.Parameter}]}
whereas the corresponding entries prior to saving from python are:
conv1.weight torch.Size([64, 3, 7, 7])
bn1.weight torch.Size([64])
bn1.bias torch.Size([64])
bn1.running_mean torch.Size([64])
bn1.running_var torch.Size([64])
bn1.num_batches_tracked torch.Size([])
layer1.0.conv1.weight torch.Size([64, 64, 3, 3])
layer1.0.bn1.weight torch.Size([64])
layer1.0.bn1.bias torch.Size([64])
I tried amending the ResNet.cs code to reflect the python names, but could not get them to exactly match.
I also tried calling load() with strict=false myModel.load(mPath, false);. This seemed to get past the Mismatched names exception, but throws another exception with message Too many bytes in what should have been a 7 bit encoded Int32.
I've been struggling with this for a couple of days now so would really appreciate any help you guys could offer.
Thanks
Jim