Implementation of Incremental Decoding

## Description
The implementation of incremental decoding in gluon-nlp is somewhat different from fairseq. In fairseq, the keys/values both before and after linear projection are memorialized, but in gluon-nlp, only the keys/values before the linear projection is memorialized. This difference leads to different execution number of FC operators (In [fairseq](https://github.com/pytorch/fairseq/blob/0dfd6b624081fc4e1c72fc74ae0cd2de199c334c/fairseq/modules/multihead_attention.py#L263-L281), keys/values are directly pulled from prev_keys/prev_values; In [gluon-nlp](https://github.com/dmlc/gluon-nlp/blob/5d4bc9eba7226ea9f9aabbbd39e3b1e886547e48/src/gluonnlp/models/transformer.py#L721-L724), two more linear projections are needed to get the projectioned keys/values). We may need to correct the gluon-nlp's implementation of incremental decoding. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of Incremental Decoding #1582

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Implementation of Incremental Decoding #1582

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions