Skip to content

Question for your paper MSD #5

@curryandsun

Description

@curryandsun

Hi,
Thanks for sharing this code and it's really helpful.

Recently I read your paper:"MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks".It's a very interesting work and the results are much better than the paper "be your own teacher" which you reimplement here.

However,after reading your paper,I could just find some slight differences between this two papers:
1.the differences of bottleneck in the model.
2.some changes of hyper-parameter.

Is there some important details that I missed?And could you please tell me about the key difference between the two papers that lead to such a significant improvement?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions