Description
Hi, I hope you are doing well. While going through your implementation on the pointer generator, I have noticed that there's a difference in the implementation of the p_gen
calculation versus the formula mentioned in the paper.
I request some clarity as to why it has been implemented this way (if there is any advantage in doing so).
y_t_1_embd = self.embedding(y_t_1)
x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
lstm_out, s_t = self.lstm(x.unsqueeze(1), s_t_1)
h_decoder, c_decoder = s_t
s_t_hat = torch.cat((h_decoder.view(-1, config.hidden_dim),
c_decoder.view(-1, config.hidden_dim)), 1) # B x 2*hidden_dim
c_t, attn_dist, coverage_next = self.attention_network(s_t_hat, encoder_outputs, encoder_feature,
enc_padding_mask, coverage)
if self.training or step > 0:
coverage = coverage_next
p_gen = None
if config.pointer_gen:
p_gen_input = torch.cat((c_t, s_t_hat, x), 1) # B x (2*2*hidden_dim + emb_dim)
p_gen = self.p_gen_linear(p_gen_input)
p_gen = F.sigmoid(p_gen)
From what I know, the p_gen takes in the context vector c_t
, the s_t_hat
and the input y_t_1
separately, but you've passed the concatenated input x
.
I am attaching a screenshot from the original paper as a reference.
From what I can see here, they are directly passing in the decoder input x_t
into the sigmoid instead of concatenating the context vector with it.
In this line however,
x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
the context vector is being concatenated with the input, before being fed into the sigmoid function:
p_gen_input = torch.cat((c_t, s_t_hat, x), 1) # B x (2*2*hidden_dim + emb_dim)
p_gen = self.p_gen_linear(p_gen_input)
p_gen = F.sigmoid(p_gen)
Thank you!