Description
1. A self distillation scheme built upon distilling different augmented/distorted images by the same student.
2.A MMD loss distilling the features between different augmented/distorted images
Modifications
Probably removing the MMD loss and only retain the KL loss is fine,
since it can already demonstrate competitive performance.
The methods shows to be a very powerful self-distillation scheme, even with the absence of MMD loss, with my my local experiments on CIFAR10/100.
Plus, it also demonstrate a strong compatibility with other distillation scheme, and can perform as a component.