Skip to content

-np.inf in mask_3d causes numerical instability ? #10

@chiragjn

Description

@chiragjn

I have found using -np.inf in the inter-attention module (attend part) often leads to nan loss computations even with gradient clipping or very low learning rates. Replacing it with some large value like -1e18 helps my case.

Could it be because there is some error in masking before calculating attention scores?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions