Skip to content

Setting of β #32

Description

@tangbohu

Hi.

In the paper, the authors said "As for parameter β in eq. 2, it usually varies about 0.1, as we
set it to 10^3 divided by number of elements in attention map and batch size for each layer. "

But I am still confused. What is 10^3 mean, and how 0.1 was got?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions