Skip to content

【not bug】how to distill qwen2.5 7b to qwen2.5 3b. #2416

Closed
@whk6688

Description

i want to distill qwen2.5 7b to qwen2.5 3b. but they have different vocab size. now i choice to crop tensor (teacher_logits[:,:151936]). is there better way to solve the issue?

thanks

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

triagedThis issue has been assigned an owner and appropriate label

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions