Hi, could you please shed some light on the feature-wise sorting?
Though this operation is permutation-invariant, I'm still having trouble understanding it.
In the paper it says "A transformation (such as with an MLP) prior to the pooling can ensure that the features being sorted are mostly independent so that little information is lost by treating the features independently."
Why can this operation help to solve the problem of a significant bottleneck when compressing a set of any size down to a single feature vector?
Hi, could you please shed some light on the feature-wise sorting?
Though this operation is permutation-invariant, I'm still having trouble understanding it.
In the paper it says "A transformation (such as with an MLP) prior to the pooling can ensure that the features being sorted are mostly independent so that little information is lost by treating the features independently."
Why can this operation help to solve the problem of a significant bottleneck when compressing a set of any size down to a single feature vector?