Skip to content

[chunked loss] align teacher and student logit shape#634

Merged
shivam15s merged 2 commits intomainfrom
kd_align
Mar 28, 2025
Merged

[chunked loss] align teacher and student logit shape#634
shivam15s merged 2 commits intomainfrom
kd_align

Conversation

@yundai424
Copy link
Copy Markdown
Collaborator

Summary

In rare cases where the teacher and student models don't have the same vocab size (but their vocabs are actually the same), for example qwen models, we pad students to match the teacher's logit.

Testing Done

make test

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Copy link
Copy Markdown
Collaborator

@shivam15s shivam15s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shivam15s shivam15s merged commit 87187b1 into main Mar 28, 2025
6 of 8 checks passed
@shivam15s shivam15s deleted the kd_align branch March 28, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants