Add Video Swin Transformer#2369
Conversation
tirthasheshpatel
left a comment
There was a problem hiding this comment.
Amazing work here 🎉 Thanks @innat! I still need to test presets.
The model looks very good overall, just some nits about exporting some layers and models.
| return { | ||
| "videoswin_base_kinetics400": copy.deepcopy( | ||
| backbone_presets["videoswin_base_kinetics400"] | ||
| ), |
There was a problem hiding this comment.
The backbone base model has more than one checkpints.
- with kinetics-400-base (current)
- with kinetics-400-base-imagenet22k
- with kinetics-600-base-imagenet22k
- with something-something-v2
How to facilitate the preset method for all of these?
def presets(cls):
"""Dictionary of preset names and configurations."""
return {
"videoswin_base_kinetics400": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400"]
),
"videoswin_base_kinetics400_imagenet22k": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400_imagenet22k"]
),
...
}|
Summarizing weight check. Backbones (tolerance 1e-4) Classifier (tolerance 1e-5) notebook-1 for kinetics-400 (tiny, small, base, base-imagenet22k) @tirthasheshpatel @divyashreepathihalli Note, In notebook-1, torchvision lib is used to load video-swin api and the pytorch weights they offered, whereas in notebook-2, raw official code and weights are loaded. |
ONNXI noticed others also tried to export this model to onnx format but failed and reported to the official repo, tickets. So, I tried with this implementation with torch backend and it works as expected. model = VideoClassifier(
backbone=backbone,
num_classes=num_classes,
activation=None,
pooling='avg',
)
model.eval()batch_size = 1
#Input to the model
x = torch.randn(batch_size, 32, 224, 224, 3, requires_grad=True)
torch_out = model(x)Using the torch official guideline. torch.onnx.export(
model, # model being run
x, # model input (or a tuple for multiple inputs)
"vswin.onnx",
export_params=True,
opset_version=10,
do_constant_folding=True,
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={
'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}
}
)import onnx
import onnxruntime
def to_numpy(tensor):
if tensor.requires_grad:
tensor = tensor.detach()
tensor = tensor.cpu()
numpy_array = tensor.numpy()
return numpy_array
onnx_model = onnx.load("vswin.onnx")
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession(
"vswin.onnx", providers=["CPUExecutionProvider"]
)
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)Logit checking. np.testing.assert_allclose(
to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05
) |
divyashreepathihalli
left a comment
There was a problem hiding this comment.
lets move the video_swin layers into the model folder itself. Everything else LGTM!
Sorry, could u please elaborate? |
Nope! all model specific layers should be inside the model folder. Only generic layers will go under the layers folder. |
|
I think the test is failling for other issue. |
|
Thank you for this awesome contribution!!! |
What does this PR do?
Fixes #2262
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed.