Bug in MinimalGatedUnit

https://github.com/google/flax/blob/d59132d9b17cdf2333ef72f6d7c96a21d81b71ba/flax/linen/recurrent.py#L725

This should be `1 - f`, according to the paper. Confusion arose around the effect of the "forget" gate (in LSTM and GRU papers, information is passed through when `f` is high, but in MGU paper it is the opposite). Variable `f` from the MGU paper, is effectively `1 - f` in Flax (it is the portion that is contributes to short-term response, or `n` in Flax-speak). From the paper:

> In MGU, the forget gate f_t is first generated, and the element-wise product between 1 - f_t and h_{t−1} becomes part of the new hidden state h_t. The portion of h_{t-1} that is "forgotten" (f_t h_{t−1}) is combined with x_t to produce h_bar_t, the short-term response. A portion of h_bar_t (determined again by f_t) form the second part of h_t.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in MinimalGatedUnit #4608

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug in MinimalGatedUnit #4608

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions