DistillKit is an open-source toolkit for doing knowledge distillation (KLD). The repo was inspired by acree-ai/DistillKit. The main motivation behind the toolkit was to support offline distillation and PEFT for low computation resource settings.
- Logit Distillation: Supports same-architecture teacher and student models.
- Pre-Computed Logits: Enables memory-efficient training by generating logits in advance.
- LoRA Fine-Tuning Integration: Efficient low-rank adaptation fine-tuning support.
- Quantization Support: 4-bit model quantization for faster inference and reduced memory usage.
- Accelerate & DeepSpeed Integration: Support for distributed training with optimized memory usage.
git clone https://github.com/agokrani/distillkitplus.git
cd distillkitplus
pip install -r requirements.txt
pip install .-
Configure your distillation settings in
config/default_config.json -
Generate teacher logits:
python scripts/local/generate_logits.py --config config/default_config.json
-
Run distillation:
Without Accelerate (default):
python scripts/local/distill_logits.py --config config/default_config.json
With Accelerate & DeepSpeed:
# Make sure to set "use_accelerate": true in your config file accelerate launch --config_file config/accelerate_configs/default_config.yaml scripts/local/distill_logits.py --config config/default_config.json
DistillKitPlus also supports running scripts using Modal. Follow the steps below to perform knowledge distillation with Modal.
Use the following commands with Modal:
- Generate teacher logits:
python scripts/modal/generate_logits.py --config config/default_config.json
- Run distillation:
python scripts/modal/distill_logits.py --config config/default_config.json
When using Modal, the accelerate configuration is handled internally based on your config file settings. Just set "use_accelerate": true and specify "accelerate_config" in the "execution" section of your config file.
The toolkit uses a JSON configuration file with the following main sections:
project_name: Name of your distillation projectdataset: Dataset configuration including source and processing settingsmodels: Teacher and student model specificationstokenizer: Tokenizer settings including max length and paddingtraining: Training hyperparametersdistillation: Distillation-specific parameters (temperature, alpha)lora: LoRA configuration for efficient fine-tuningquantization: Model quantization settingsexecution: Settings for accelerate and distributed training
See config/default_config.json for a complete example.
We welcome contributions from the community! If you have ideas for improvements, new features, or bug fixes, please feel free to open an issue or submit a pull request.
For any technical questions or issues, please open an issue in this repository. We appreciate your feedback and support!
