Add ViViT(Video Vision Transformer) to KerasCV #2335

aditya02shah · 2024-02-07T16:15:44Z

What does this PR do?

Adding ViViT model

Overview:
This PR integrates the ViViT model into KerasCV along with the inclusion of relevant test cases

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you write any new necessary tests?
If this adds a new model, can you run a few training steps on TPU in Colab to ensure that no XLA incompatible OP are used?

Who can review?

@divyashreepathihalli

divyashreepathihalli · 2024-02-07T19:31:18Z

Thanks for the PR @aditya02shah, can you please add a colab demo to verify the results and also share the weights file with us. How does this compare to HF implementation?

aditya02shah · 2024-02-08T16:26:23Z

@divyashreepathihalli This implementation closely aligns to the one used in keras-examples.
Here is a Colab Demo. It is similar to the HF implementation, but easier to use and with simpler functionality.

pranavvp16 · 2024-02-09T15:09:04Z

@aditya02shah what expected here is the outputs of your model should match with the outputs in the hf implementation in last layer . Am i right @divyashreepathihalli ??

innat-asj · 2024-02-11T08:31:30Z

FYI, Official implementation: https://github.com/google-research/scenic/tree/aaeaa203bfbbaf3d2c6d9865fe86d1379cfe4a58/scenic/projects/vivit

divyashreepathihalli · 2024-02-12T18:09:09Z

@divyashreepathihalli This implementation closely aligns to the one used in keras-examples. Here is a Colab Demo. It is similar to the HF implementation, but easier to use and with simpler functionality.

Thanks Adithya!! If the outputs match the example that is good enough. But I would like to see a colab demo that uses the changes from your PR.
you can test your changes on the colab by installing your repo like this
!pip install -q git+https://github.com/<your-github-username>/keras-cv.git@<branch-name-which-has-the-changes>

aditya02shah · 2024-02-13T16:30:37Z

@divyashreepathihalli I've created a Colab demo that incorporates the changes from my pull request. You can access it here

divyashreepathihalli

Thank you for the PR @aditya02shah. I have left a few cleanup comments. Also, lets make sure the tests pass.

keras_cv/models/video_classification/vivit.py

divyashreepathihalli · 2024-02-15T21:14:27Z

keras_cv/models/video_classification/vivit_layers.py

+        self.patch_size = patch_size
+
+    def build(self, input_shape):
+        self.projection = keras.layers.Conv3D(


define all layers in init and build them here like self.layer_name.build(expected_input_shape)

keras_cv/models/video_classification/vivit_layers.py

aditya02shah · 2024-02-23T13:15:27Z

@divyashreepathihalli I have made the recommended changes. You can find the colab for the latest commit here

divyashreepathihalli · 2024-02-26T22:47:40Z

Thanks @aditya02shah!!
one additional chore
please add keras_cv/models/video_classification \ to this file
https://github.com/keras-team/keras-cv/blob/master/.kokoro/github/ubuntu/gpu/build.sh
to line 72 and 86

PS: we will fix this overhead soon, but in the mean time this is what we need to do.

aditya02shah · 2024-02-27T01:38:03Z

@divyashreepathihalli No worries, I have updated the build script!

divyashreepathihalli

Mostly LGTM, just one NIT regarding the build method.

keras_cv/models/video_classification/vivit.py

aditya02shah · 2024-03-01T16:26:30Z

@divyashreepathihalli I have made revisions to the build method. Colab for the latest changes.

divyashreepathihalli · 2024-03-05T09:54:01Z

Thanks for the update @aditya02shah! there is one error that needs to be fixed

_________________________ ViViT_Test.test_saved_model __________________________

self = 

    @pytest.mark.large  # Saving is slow, so mark these large.
    def test_saved_model(self):
        input_shape = (28, 28, 28, 1)
        num_classes = 11
        patch_size = (8, 8, 8)
        layer_norm_eps = 1e-6
        projection_dim = 128
        num_heads = 8
        num_layers = 8
    
>       model = ViViT(
            projection_dim=projection_dim,
            patch_size=patch_size,
            inp_shape=input_shape,
            transformer_layers=num_layers,
            num_heads=num_heads,
            layer_norm_eps=layer_norm_eps,
            num_classes=num_classes,
        )

keras_cv/models/video_classification/vivit_test.py:135: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
keras_cv/models/video_classification/vivit.py:107: in __init__
    super().__init__(**kwargs)
keras_cv/models/task.py:30: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (), kwargs = {}, previous_value = True

    def _method_wrapper(self, *args, **kwargs):
      previous_value = getattr(self, "_self_setattr_tracking", True)
      self._self_setattr_tracking = False  # pylint: disable=protected-access
      try:
>       result = method(self, *args, **kwargs)
E       TypeError: __init__() missing 2 required positional arguments: 'inputs' and 'outputs'

/tmpfs/venv/lib/python3.9/site-packages/tensorflow/python/trackable/base.py:204: TypeError
__________________________ ViViT_Test.test_vivit_call __________________________

self = 

    def test_vivit_call(self):
        input_shape = (28, 28, 28, 1)
        num_classes = 11
        patch_size = (8, 8, 8)
        layer_norm_eps = 1e-6
        projection_dim = 128
        num_heads = 8
        num_layers = 8
    
>       model = ViViT(
            projection_dim=projection_dim,
            patch_size=patch_size,
            inp_shape=input_shape,
            transformer_layers=num_layers,
            num_heads=num_heads,
            layer_norm_eps=layer_norm_eps,
            num_classes=num_classes,
        )

keras_cv/models/video_classification/vivit_test.py:67: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
keras_cv/models/video_classification/vivit.py:107: in __init__
    super().__init__(**kwargs)
keras_cv/models/task.py:30: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (), kwargs = {}, previous_value = True

    def _method_wrapper(self, *args, **kwargs):
      previous_value = getattr(self, "_self_setattr_tracking", True)
      self._self_setattr_tracking = False  # pylint: disable=protected-access
      try:
>       result = method(self, *args, **kwargs)
E       TypeError: __init__() missing 2 required positional arguments: 'inputs' and 'outputs'

/tmpfs/venv/lib/python3.9/site-packages/tensorflow/python/trackable/base.py:204: TypeError
______________________ ViViT_Test.test_vivit_construction ______________________

self =

aditya02shah added 9 commits January 27, 2024 13:29

Initialised video-classification/vivit

aaae396

Initialised ViViT model and add dependent layers

a1da121

Updated __init__.py

3ccd176

Added model construction and call tests

9a39aa6

Updated imports

30e1c8e

Added tests

82f06c3

Added docs and some minor adjustments

2612a04

Updated Documentation and Default Parameters

8099869

Updated comments

13f4829

divyashreepathihalli requested a review from sampathweb February 7, 2024 19:29

divyashreepathihalli requested changes Feb 15, 2024

View reviewed changes

Updating parameters and build method

0b6043f

aditya02shah and others added 2 commits February 27, 2024 06:59

Merge branch 'keras-team:master' into vivit

f6b2f7c

Updated build.sh

9536d35

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Feb 27, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Feb 27, 2024

divyashreepathihalli requested changes Feb 27, 2024

View reviewed changes

keras_cv/models/video_classification/vivit.py Show resolved Hide resolved

aditya02shah and others added 2 commits March 1, 2024 21:29

Merge branch 'keras-team:master' into vivit

9f174d4

Updated Build Method

36541bb

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 4, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 4, 2024

Add ViViT(Video Vision Transformer) to KerasCV #2335

Are you sure you want to change the base?

Add ViViT(Video Vision Transformer) to KerasCV #2335

Uh oh!

Conversation

aditya02shah commented Feb 7, 2024

What does this PR do?

Before submitting

Who can review?

Uh oh!

divyashreepathihalli commented Feb 7, 2024

Uh oh!

aditya02shah commented Feb 8, 2024

Uh oh!

pranavvp16 commented Feb 9, 2024

Uh oh!

innat-asj commented Feb 11, 2024

Uh oh!

divyashreepathihalli commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditya02shah commented Feb 13, 2024

Uh oh!

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli Feb 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aditya02shah commented Feb 23, 2024

Uh oh!

divyashreepathihalli commented Feb 26, 2024

Uh oh!

aditya02shah commented Feb 27, 2024

Uh oh!

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aditya02shah commented Mar 1, 2024

Uh oh!

divyashreepathihalli commented Mar 5, 2024

Uh oh!

Uh oh!

divyashreepathihalli commented Feb 12, 2024 •

edited

Loading