Dev by michalozeryflato · Pull Request #1 · michalozeryflato/transformers

michalozeryflato · 2023-10-04T08:20:43Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

extend the computation of the T5 relative position embedding to work also on input position ids and not just [0,..ntokens-1]

mosheraboh

Hi @michalozeryflato
Looks good, I've added few questions inline.

mosheraboh · 2023-10-05T09:17:18Z

        values = self.relative_attention_bias(relative_position_bucket)  # shape (query_length, key_length, num_heads)
-        values = values.permute([2, 0, 1]).unsqueeze(0)  # shape (1, num_heads, query_length, key_length)
+        if position_ids is not None:
+            values = values.permute([0, 3, 1, 2])


Why? can you explain in a comment what each dimension is?

values.shape was originally (query_length, key_length, num_heads) - see original comment.
I extended it to account for the case when position_ids are given in input.
So shape is (num_batches - optional, query_length, key_length, num_heads).
Added comments in the code.

mosheraboh · 2023-10-05T09:19:14Z

-        memory_position = torch.arange(key_length, dtype=torch.long, device=device)[None, :]
+            device = relative_attention_bias.weight.device
+        if position_ids is not None:
+            context_position = position_ids.unsqueeze(2)


what is the shape before unsqueeze?

When position_ids are given we have additional batch dimension: shape =(num_batches, key_and_query_length)
I added this in a comment:

context_position = position_ids.unsqueeze(2) # shape (num_batches, key_and_query_lnegth) -> (num_batches, key_and_query_lnegth, 1) memory_position = position_ids.unsqueeze(1) # shape (num_batches, key_and_query_lnegth) -> (num_batches, 1, key_and_query_lnegth)

mosheraboh · 2023-10-05T09:21:55Z

    def __init__(self, config):
        super().__init__()
-        self.EncDecAttention = T5Attention(config, has_relative_attention_bias=False)
+        self.EncDecAttention = T5Attention(config, relative_position_embedding_definitions=None)


So you didn't play with this attention mechanism?

I extended T5Attention to support various relative position encodings either original, None, New (e.g. with position ids), or any combination of original and new.

In this cases = Note that it always used has_relative_attention_bias=False, so I kept the same performance - did not add the relative position embedding when it did not exist before. We can discuss this

mosheraboh · 2023-10-05T09:38:37Z

        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
+        encoder_position_ids_dict: Optional[Dict[str, Tuple[torch.LongTensor,str]]] = None,


What is the shape of this tensor?

added a comment:
shape of each LongTensor (num_batches, n_input_tokens)

michalozeryflato added 2 commits September 22, 2023 15:01

supporting different position embeddings

476879a

support relative embedding on input position ids

8be66e6

extend the computation of the T5 relative position embedding to work also on input position ids and not just [0,..ntokens-1]

michalozeryflato marked this pull request as ready for review October 4, 2023 08:20

mosheraboh reviewed Oct 5, 2023

View reviewed changes

michalozeryflato added 3 commits October 5, 2023 19:11

add comments + create embed params only if needed

d816f4e

support original T5Config

47e9c08

support relative binary segment embedding

25fb1f1

michalozeryflato marked this pull request as draft October 24, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#1

Dev#1
michalozeryflato wants to merge 5 commits into
mainfrom
dev

michalozeryflato commented Oct 4, 2023

Uh oh!

mosheraboh left a comment

Uh oh!

mosheraboh Oct 5, 2023

Uh oh!

michalozeryflato Oct 5, 2023

Uh oh!

mosheraboh Oct 5, 2023

Uh oh!

michalozeryflato Oct 5, 2023

Uh oh!

mosheraboh Oct 5, 2023

Uh oh!

michalozeryflato Oct 5, 2023

Uh oh!

michalozeryflato Oct 5, 2023

Uh oh!

mosheraboh Oct 5, 2023

Uh oh!

michalozeryflato Oct 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michalozeryflato commented Oct 4, 2023

What does this PR do?

Before submitting

Who can review?

Uh oh!

mosheraboh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants