Discrepancy with implementation and the paper 

Hi, I hope you are doing well. While going through your implementation on the pointer generator, I have noticed that there's a difference in the implementation of the `p_gen` calculation versus the formula mentioned in the paper. 
I request some clarity as to why it has been implemented this way (if there is any advantage in doing so). 

```
        y_t_1_embd = self.embedding(y_t_1)
        x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
        lstm_out, s_t = self.lstm(x.unsqueeze(1), s_t_1)

        h_decoder, c_decoder = s_t
        s_t_hat = torch.cat((h_decoder.view(-1, config.hidden_dim),
                             c_decoder.view(-1, config.hidden_dim)), 1)  # B x 2*hidden_dim
        c_t, attn_dist, coverage_next = self.attention_network(s_t_hat, encoder_outputs, encoder_feature,
                                                          enc_padding_mask, coverage)

        if self.training or step > 0:
            coverage = coverage_next

        p_gen = None
        if config.pointer_gen:
            p_gen_input = torch.cat((c_t, s_t_hat, x), 1)  # B x (2*2*hidden_dim + emb_dim)
            p_gen = self.p_gen_linear(p_gen_input)
            p_gen = F.sigmoid(p_gen)
 ```
From what I know, the p_gen takes in the context vector `c_t` , the `s_t_hat` and the input `y_t_1` separately, but you've passed the concatenated input `x` .  
I am attaching a screenshot from the original [paper](https://arxiv.org/pdf/1704.04368.pdf) as a reference.   
![pointer_gen](https://user-images.githubusercontent.com/39939017/117530848-01fee200-affd-11eb-951b-082ca0e4c68a.png)  
From what I can see here, they are directly passing in the decoder input `x_t` into the sigmoid instead of concatenating the context vector with it.
In this line however, 
```
x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
```
the context vector is being concatenated with the input, before being fed into the sigmoid function:  
```
            p_gen_input = torch.cat((c_t, s_t_hat, x), 1)  # B x (2*2*hidden_dim + emb_dim)
            p_gen = self.p_gen_linear(p_gen_input)
            p_gen = F.sigmoid(p_gen)
```
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancy with implementation and the paper #60

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discrepancy with implementation and the paper #60

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions