Add ViViT(Video Vision Transformer) to KerasCV#2335
Add ViViT(Video Vision Transformer) to KerasCV#2335aditya02shah wants to merge 14 commits intokeras-team:masterfrom
Conversation
|
Thanks for the PR @aditya02shah, can you please add a colab demo to verify the results and also share the weights file with us. How does this compare to HF implementation? |
|
@divyashreepathihalli This implementation closely aligns to the one used in keras-examples. |
|
@aditya02shah what expected here is the outputs of your model should match with the outputs in the hf implementation in last layer . Am i right @divyashreepathihalli ?? |
|
FYI, Official implementation: https://github.com/google-research/scenic/tree/aaeaa203bfbbaf3d2c6d9865fe86d1379cfe4a58/scenic/projects/vivit |
Thanks Adithya!! If the outputs match the example that is good enough. But I would like to see a colab demo that uses the changes from your PR. |
|
@divyashreepathihalli I've created a Colab demo that incorporates the changes from my pull request. You can access it here |
divyashreepathihalli
left a comment
There was a problem hiding this comment.
Thank you for the PR @aditya02shah. I have left a few cleanup comments. Also, lets make sure the tests pass.
| self.patch_size = patch_size | ||
|
|
||
| def build(self, input_shape): | ||
| self.projection = keras.layers.Conv3D( |
There was a problem hiding this comment.
define all layers in init and build them here like self.layer_name.build(expected_input_shape)
|
@divyashreepathihalli I have made the recommended changes. You can find the colab for the latest commit here |
|
Thanks @aditya02shah!! PS: we will fix this overhead soon, but in the mean time this is what we need to do. |
|
@divyashreepathihalli No worries, I have updated the build script! |
divyashreepathihalli
left a comment
There was a problem hiding this comment.
Mostly LGTM, just one NIT regarding the build method.
|
@divyashreepathihalli I have made revisions to the build method. Colab for the latest changes. |
|
Thanks for the update @aditya02shah! there is one error that needs to be fixed |
What does this PR do?
Adding ViViT model
Overview:
This PR integrates the ViViT model into KerasCV along with the inclusion of relevant test cases
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@divyashreepathihalli