Skip to content
This repository was archived by the owner on Jun 19, 2025. It is now read-only.

Conversation

bernardohenz
Copy link
Contributor

PR that allows the use of layer-norm in the model. For our experiments, it allows for training more epochs without overfitting.

PS: I'll be creating a PR in you tensorflow repository, as layer-norm uses some dependencies that are not included in your build rule. Nonetheless, I'm sending the new rule here (which were found in tensorflow/core/kernels/BUILD). When compiling in the 0.6.1 version, the following rule worked just fine:

tf_kernel_library(
    name = "deepspeech_cwise_ops",
    srcs = [
        "cwise_op_less.cc",
        "cwise_op_minimum.cc",
        "cwise_op_mul_1.cc",
        "cwise_op_squared_difference.cc",
        "cwise_op_add_1.cc",
        "cwise_op_add_2.cc",
        "cwise_op_rsqrt.cc",
        "cwise_op_sub.cc",
    ],
    gpu_srcs = [
        "cwise_op_gpu_less.cu.cc",
        "cwise_op_gpu_minimum.cu.cc",
        "cwise_op_gpu_mul.cu.cc",
        "cwise_op_gpu_squared_difference.cu.cc",
        "cwise_op_gpu_add.cu.cc",
        "cwise_op_gpu_rsqrt.cu.cc",
        "cwise_op_gpu_sub.cu.cc",
    ],
    deps = [
        ":cwise_lib",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//third_party/eigen3",
    ],
}

I'll be compiling the binaries in the master to check if it still works.

@community-tc-integration
Copy link

No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

@lissyx
Copy link
Collaborator

lissyx commented Aug 18, 2020

I'll be compiling the binaries in the master to check if it still works.

Master moved to r2.3

PS: I'll be creating a PR in you tensorflow repository, as layer-norm uses some dependencies that are not included in your build rule. Nonetheless, I'm sending the new rule here (which were found in tensorflow/core/kernels/BUILD).

Pretty sure we already have those

@lissyx
Copy link
Collaborator

lissyx commented Aug 18, 2020

I'm sending the new rule here

https://github.com/mozilla/tensorflow/blob/r2.3/tensorflow/core/kernels/BUILD#L8655-L8673

Ok, so you just have a few files to add?

Please make sure you:

  • update tensorflow submodule
  • update taskcluster/.shared.yml tensorflow references

@bernardohenz
Copy link
Contributor Author

bernardohenz commented Aug 18, 2020

Ok, so you just have a few files to add?

Yes, these new files that I need to check if it works on the r2.3. But I believe this will work just fine

@lissyx
Copy link
Collaborator

lissyx commented Aug 18, 2020

For our experiments, it allows for training more epochs without overfitting.

Do you mind sharing more on that?

@lissyx lissyx requested a review from reuben August 18, 2020 16:11
@bernardohenz
Copy link
Contributor Author

bernardohenz commented Aug 18, 2020

Do you mind sharing more on that?

The experiments we have done are a little old right now. I performing a new training benchmark right now, soon I'll let you know.

Copy link
Contributor

@reuben reuben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me. Thanks Bernardo!

@bernardohenz
Copy link
Contributor Author

All is working in the binaries, already create a PR for mozilla tensorflow: mozilla/tensorflow#124

@DanBmh
Copy link
Contributor

DanBmh commented Aug 20, 2020

For training with already existing checkpoints you have to initialize the new layers first. This did work for me:

# training/deepspeech_training/utils/checkpoins.py

def _load_checkpoint(session, checkpoint_path, allow_drop_layers):
    [...]

    if FLAGS.layer_norm:
        for v in load_vars:
            if v.op.name not in vars_in_ckpt:
                if 'LayerNorm' in v.name:
                    init_vars.add(v)
                else:
                    msg = "Tried to train with layer normalization but there was " \
                          "a missing variable other than the LayerNorm tensors: {}"
                    log_error(msg.format(v))
                    sys.exit(1)
        load_vars -= init_vars

I also did run a short test transfer-learning the English checkpoint to German with a small dataset (~32h), but layer-norm didn't help here:

Dataset Additional Infos Losses Training epochs of best model Total training duration
Voxforge without layer-norm Test: 30.655203, Validation: 33.655750 9 48 min
Voxforge with layer normalization Test: 57.330410, Validation: 61.025009 45 2:37 h

Maybe training the reinitialized LayerNorm tensors only and freezing the rest of the network (see #3247) before training the complete network would help here

@bernardohenz
Copy link
Contributor Author

@DanBmh you cannot take the weights (checkpoint) from a model trained without layer-norm, and use them to finetune/transferlearning to a model that uses layernorm. The model's architecture are just different.

For instance, while the 2nd dense layer from the current checkpoint (without LN) was trained to process tensors in a certain range; the 2nd dense layer in an architecture with LN will process tensors with a completely different range (not only range, but mean and var as well).

Also, that's why I've put the argument layer_norm along with n_hidden in the geometry section. These arguments dictates the geometry/architecture of your model. If you wish to finetune/transfer-learning from a trained model, you should stick right with the architecture it was trained.

@DanBmh
Copy link
Contributor

DanBmh commented Aug 20, 2020

For instance, while the 2nd dense layer from the current checkpoint (without LN) was trained to process tensors in a certain range; the 2nd dense layer in an architecture with LN will process tensors with a completely different range (not only range, but mean and var as well).

@bernardohenz you're right. I just was hoping that transfer-learning performance wasn't that bad and did a short test run.

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

All is working in the binaries, already create a PR for mozilla tensorflow: mozilla/tensorflow#124

Thanks @bernardohenz !

Can you send a PR against mozilla/STT that:

  • changes .gitmodules to fetch your tensorflow repo
  • changes tensorflow sha1 checkout to your changes
  • changes taskcluster/.shared.yml tensorflow SHA1 references to your new sha1

This is required for us to be able to run your PR with all your changes and ensure nothing regresses

@lissyx lissyx self-requested a review August 24, 2020 15:10
@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

@bernardohenz
Copy link
Contributor Author

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

Yes, I was about to post asking for some help with this. What am I supposed to do? Just replace the old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') to the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')?
Or should I compile anything and upload to you guys?

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

Yes, I was about to post asking for some help with this. What am I supposed to do? Just replace the old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') to the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')?
Or should I compile anything and upload to you guys?

Just replace, that is the purpose of those references: our CI will check if the taskcluster index exists, and if not, it will build it.

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

@bernardohenz Please don't merge but rebase.

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

@bernardohenz Can you please clean the history? No merge, no "revert" of the previous commit. Force-push is fine, no worries.

@bernardohenz
Copy link
Contributor Author

I believe this is ok now. Sorry for the mess.

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

I believe this is ok now. Sorry for the mess.

Thanks; one thing I forgot, can you please update taskcluster/.build.yml reference to tensorflow version with the value matching your git describe --long --tags?

@lissyx
Copy link
Collaborator

lissyx commented Aug 24, 2020

You can see the progress on the Community-TC link 😊

@bernardohenz
Copy link
Contributor Author

You can see the progress on the Community-TC link

Nice :D

@lissyx
Copy link
Collaborator

lissyx commented Aug 25, 2020

You can see the progress on the Community-TC link

Nice :D

macOS CI was a bit of a burden (it always is), but it's green in the end. I'm going to merge your TensorFlow part and then re-run PR with new sha1 and take care of the rest.

@lissyx lissyx closed this Aug 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants