This project explores dialogue summarization using the DialogSum dataset. Twovariants of the FLAN-T5 model were trained and evaluated against the original base model to understand the effects of different tuning strategies on performance.
- Base FLAN-T5: The original model, used without any fine-tuning as a baseline.
- Instruction-Fine-Tuned FLAN-T5: Fine-tuned using full-parameter training with task-specific instructions for improved generalization.
- PEFT Fine-Tuned FLAN-T5: Parameter-Efficient Fine-Tuning implemented via adapters (e.g., LoRA), significantly reducing the number of trainable parameters while retaining performance.
This project uses the FLAN-T5-base model as the foundation for all experiments:
- Model: google/flan-t5-base
- Parameters: ~248M parameters
- Architecture: T5-based encoder-decoder model fine-tuned on instruction-following tasks
- Hugging Face Model Card: Comprehensive documentation of capabilities, training data, and usage guidelines
The FLAN-T5 family includes multiple sizes. For reference:
- FLAN-T5-small: google/flan-t5-small (80M parameters)
- FLAN-T5-base: google/flan-t5-base (248M parameters) ⭐ Used in this project
- FLAN-T5-large: google/flan-t5-large (780M parameters)
- FLAN-T5-xl: google/flan-t5-xl (3B parameters)
- FLAN-T5-xxl: google/flan-t5-xxl (11B parameters)
- Source: knkarthick/dialogsum
- Description: The DialogSum dataset consists of dialogue-summary pairs designed for the abstractive summarization of multi-turn conversations.
- Platform: High Performance Computing (HPC) Cluster
- GPU: NVIDIA A100
- Memory: 80GB GPU Memory
-
Base FLAN-T5
- Used as the baseline for comparison.
- No fine-tuning applied.
-
Instruction-Fine-Tuned FLAN-T5
- Fine-tuned on task-specific instructions.
- Enhanced generalization by incorporating instruction-style prompts.
-
PEFT Fine-Tuned FLAN-T5
- Employed Parameter-Efficient Fine-Tuning using adapters.
- Achieved competitive performance with a reduced number of trainable parameters.
| Model | Trainable Parameters | Total Parameters | Training Time (min) | Epochs | Learning Rate |
|---|---|---|---|---|---|
| Instruction Fine-Tuned | 247,577,856 | 247,577,856 | 8.48 | 400 | 0.001 |
| PEFT Fine-Tuned | 3,538,944 | 251,116,800 | 4.71 | 200 | 0.001 |
- Parameter Efficiency: PEFT achieved competitive performance with 98.6% fewer trainable parameters (3.5M vs 247.6M)
- Training Efficiency: PEFT required 44% less training time (4.71 min vs 8.48 min)
- Memory Efficiency: PEFT uses only 1.4% of the parameters that full fine-tuning requires
The models were evaluated on dialogue summarization quality, with outputs stored in:
model_outputs/original_model_summaries.csv- Base FLAN-T5 outputsmodel_outputs/instruction_fine_tuned_model_summaries.csv- Full fine-tuning outputsmodel_outputs/peft_model_summaries.csv- PEFT outputs
Model Performance by Score Groups
Parameter Efficiency Comparison
Training Time Efficiency
Full Fine-Tuning Loss Curves
PEFT Training Loss Curves
This study demonstrates that PEFT (Parameter-Efficient Fine-Tuning) offers a highly effective approach for dialogue summarization:
- Significant resource savings with minimal performance trade-offs
- Faster training convergence compared to full fine-tuning
- Practical deployment advantages due to smaller model footprint
- Cost-effective solution for organizations with limited computational resources
The results validate PEFT as a superior approach for dialogue summarization tasks when computational efficiency is a priority.
Building on the PEFT results, the next phase will implement:
- Proximal Policy Optimization (PPO) - Apply reinforcement learning from human feedback to improve summary quality
- Model Distillation - Create smaller, more efficient models while maintaining performance
These experiments will be tested on the same DialogSum dataset for direct performance comparison.




