Open
Description
As discussed in the original paper, the training relies on a large number of negative pairs to tighten the lower bound of the Mutual Information, which corresponds to log(N). However, in this code, the negative pairs are constructed only in a mini-batch. Why are these negative pairs enough?
Metadata
Metadata
Assignees
Labels
No labels