Skip to content

白盒蒸馏是loss融合问题 #34

@poryfly

Description

@poryfly

total_loss = (1 - self.kd_ratio) * lm_loss + self.kd_ratio * distil_loss
这里lm_loss和distil_loss在数量级上面差了近百倍,千倍,这样直接融合是否有意义?实际数据看lm_loss刚开始都是几十,最后收敛也到了0.1量级,但distil_loss是最开始也是0.001量级,收敛到0.0001量级,这样加权distill_loss基本没效果

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions