Hi,
I noticed a difference between the algorithm described in the paper and the implementation in the codebase.
The paper adds the regularization term (calculated with global features) to the local training loss.
However, the code introduces an additional training step using global features (contrast term) instead.
Could you clarify whether this is an intended modification or if I might be missing something?
Thanks!