Skip to content

Transformer implementation seemingly not corresponding to paper #46

@patrickgadd

Description

@patrickgadd

Hi there,

First off - thank you for open sourcing this very interesting work(!).

I'm having a look at this repo along with the paper "Context-Aware Learning to Rank with Self-Attention", and it seems there is a bug, as the code doesn't seem to correspond exactly to what's written in the paper:

https://github.com/allegro/allRank/blob/master/allrank/models/transformer.py#L105
Here you seemingly apply LayerNorm to the inputs to the MultiHeadAttention and Fully-Connected layers.
However, in the paper (and from what I gather is general practice for Transformers), you've written that LayerNorm is applied to the outputs of these corresponding layers.

It seems that this repo isn't actively worked on, but nonetheless I thought I'd let you know.

Thanks & best wishes,
Patrick

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions