Skip to content

Conversation

@woshixiaobai2019
Copy link
Contributor

@woshixiaobai2019 woshixiaobai2019 commented Nov 11, 2025

PR type

  • Bug Fix
  • [✅] New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

条件蒸馏(Conditional Distillation)

条件蒸馏允许教师模型和学生模型使用不同的上下文或提示词进行训练,从而实现更灵活的知识迁移策略。例如:

  • 教师模型接收包含额外专家指导的提示词
  • 教师模型接收任务重构后的输入(如摘要、翻译等)
  • 教师模型使用更长的上下文信息

TeacherAdapter 插件系统

通过实现 TeacherAdapter 接口,可以自定义教师模型的上下文转换逻辑:

# swift/plugin/teacher_adapter.py
from swift.plugin import TeacherAdapter

class MyTeacherAdapter(TeacherAdapter):
    def shape_context(self, history):
        """将学生的消息转换为教师的消息

        Args:
            history: 学生模型的消息列表(OpenAI 格式)

        Returns:
            教师模型的消息列表
        """
        # 为教师添加额外的系统提示
        teacher_history = history.copy()
        if teacher_history and teacher_history[0]['role'] == 'system':
            teacher_history[0]['content'] += '\n\n你是一位专业领域专家。'
        else:
            teacher_history.insert(0, {
                'role': 'system',
                'content': '你是一位专业领域专家。'
            })
        return teacher_history

# 注册到插件系统
from swift.plugin import teacher_adapters
teacher_adapters['my_adapter'] = MyTeacherAdapter

内置 Adapter

SWIFT 提供两个内置的 teacher adapter:

Adapter 说明
default 默认:教师使用与学生相同的上下文
example 示例:为教师添加额外的系统提示

使用方法

swift rlhf \
    --rlhf_type gkd \
    --model Qwen/Qwen2.5-0.5B-Instruct \
    --teacher_model Qwen/Qwen2.5-7B-Instruct \
    --teacher_adapter example \
    --dataset your_dataset.jsonl \
    ...

工作原理

在条件蒸馏中:

  1. 学生模型处理原始输入:[prompt_student] + [response]
  2. 教师模型处理转换后的输入:[prompt_teacher] + [response]
  3. 两个模型在相同的 response tokens 上计算 logits
  4. 使用这些 logits 计算蒸馏损失

其中 prompt_teacherteacher_adapter.shape_context()prompt_student 转换而来,而 response 部分保持不变。

训练脚本参考这里

实验结果

940004d446aebccdc6783b0142186a2b

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @woshixiaobai2019, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the GKD (Generative Knowledge Distillation) trainer by introducing conditional distillation. This new capability allows for more flexible knowledge transfer by enabling the teacher and student models to process different input contexts or prompts while distilling knowledge. The feature is implemented through an extensible TeacherAdapter plugin system, providing a clear mechanism for users to customize how teacher prompts are generated from student prompts. This change is accompanied by updated training arguments, modifications to the data processing pipeline, and comprehensive documentation, making it easier for users to leverage this advanced distillation technique.

Highlights

  • Conditional Distillation Feature: Introduced support for conditional distillation in the GKD trainer, allowing teacher and student models to use distinct contexts or prompts during training.
  • Extensible TeacherAdapter System: Implemented a plugin system with a TeacherAdapter interface, enabling users to define custom logic for transforming student prompts into teacher prompts.
  • Built-in Adapter Examples: Provided default and math_teacher adapters to demonstrate context transformation, making it easier for users to get started.
  • GKD Trainer Enhancements: Modified the GKDTrainer to integrate the teacher_adapter and handle the distinct prompt encoding and loss calculation required for conditional distillation.
  • Comprehensive Documentation & Example: Added detailed documentation in both Chinese and English, along with a new example training script, to guide users on utilizing the new feature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次PR成功地为GKD训练器增加了条件蒸馏功能,这是一个非常有价值的新特性。整体实现结构清晰,特别是TeacherAdapter的插件化设计,使得扩展教师模型上下文转换逻辑变得非常方便,值得称赞。gkd_trainer.py中处理不同prompt和对齐logits计算损失的核心逻辑虽然复杂,但实现看起来是正确的。文档更新清晰明了,有助于用户理解和使用新功能。同时注意到PR中包含了大量f-string引号风格的全局 linting 修复,这有助于保持代码风格的一致性。

Comment on lines +241 to +247
if teacher_history and teacher_history[0]['role'] == 'system':
teacher_history[0]['content'] += '\n\n你是一位专业领域专家。'
else:
teacher_history.insert(0, {
'role': 'system',
'content': '你是一位专业领域专家。'
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

文档中 MyTeacherAdapter 的示例代码使用了 teacher_history[0]['content'] += ... 的方式来修改系统提示。由于 history.copy() 是浅拷贝,这样做会意外地修改原始的 history 对象中的内容。虽然在当前 GKD 的流程中可能不会引发问题,但这是一种有风险的实践。建议将文档中的示例更新为更安全的实现方式,即创建一个新的字典来替换 teacher_history[0],就像 swift/plugin/teacher_adapter.pyMathTeacherAdapter 的实现一样,以避免潜在的副作用。

Suggested change
if teacher_history and teacher_history[0]['role'] == 'system':
teacher_history[0]['content'] += '\n\n你是一位专业领域专家。'
else:
teacher_history.insert(0, {
'role': 'system',
'content': '你是一位专业领域专家。'
})
teacher_history = history.copy()
if teacher_history and teacher_history[0]['role'] == 'system':
# 更健壮的方式:创建一个新字典以避免副作用
teacher_history[0] = {
'role': 'system',
'content': teacher_history[0]['content'] + '\n\n你是一位专业领域专家。'
}
else:
teacher_history.insert(0, {
'role': 'system',
'content': '你是一位专业领域专家。'
})

Comment on lines +243 to +249
if teacher_history and teacher_history[0]['role'] == 'system':
teacher_history[0]['content'] += '\n\nYou are an expert with extensive knowledge.'
else:
teacher_history.insert(0, {
'role': 'system',
'content': 'You are an expert with extensive knowledge.'
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example code for MyTeacherAdapter in the documentation uses teacher_history[0]['content'] += ... to modify the system prompt. Since history.copy() performs a shallow copy, this approach will unintentionally modify the content of the original history object. While this might not cause issues in the current GKD workflow, it is a risky practice. It's recommended to update the example to a safer implementation by creating a new dictionary to replace teacher_history[0], similar to how MathTeacherAdapter is implemented in swift/plugin/teacher_adapter.py, to prevent potential side effects.

Suggested change
if teacher_history and teacher_history[0]['role'] == 'system':
teacher_history[0]['content'] += '\n\nYou are an expert with extensive knowledge.'
else:
teacher_history.insert(0, {
'role': 'system',
'content': 'You are an expert with extensive knowledge.'
})
teacher_history = history.copy()
if teacher_history and teacher_history[0]['role'] == 'system':
# More robust way: create a new dict to avoid side effects
teacher_history[0] = {
'role': 'system',
'content': teacher_history[0]['content'] + '\n\nYou are an expert with extensive knowledge.'
}
else:
teacher_history.insert(0, {
'role': 'system',
'content': 'You are an expert with extensive knowledge.'
})

Comment on lines 220 to 225
if 'teacher_prompt_attention_mask' in inputs:
teacher_prompts_len = inputs['teacher_prompt_attention_mask'].sum(dim=1) # [batch_size]
else:
teacher_prompts_len = torch.full((inputs['teacher_prompts'].shape[0],),
inputs['teacher_prompts'].shape[1],
device=inputs['teacher_prompts'].device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分计算 teacher_prompts_len 的逻辑与前面的第148-153行重复了。建议在 if 'teacher_prompts' in inputs: 代码块的开头计算一次 teacher_prompts_len,然后在后续的逻辑中复用这个变量,这样可以避免代码冗余,使逻辑更清晰。

mouse and others added 3 commits November 11, 2025 07:02
- Change shape_context() to accept complete data dict instead of just messages
- Allow adapter to access all fields (dataset, images, etc.) for flexible usage
- Follow GRPO's reward_model_plugin design pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant