This repository accompanies a study on mitigation measures for catastrophic forgetting in code-switched multitasks continual finetuning. The work consists of two primary experimental components:
- Empirical validation of catastrophic forgetting in vanilla finetuning pipline.
- Evaluation of forgetting mitigation strategies with a focus on adapter-based methods combined with data replay and/or knowledge distillation to improve knowledge retention in incremental CoS tasks.
The repository provides all necessary scripts to collect, preprocess, and generate CoS datasets, as well as to replicate the training and evaluation of all experimental configurations reported in the study.
This directory contains scripts for generating training datasets for CoS experiments.
Some scripts are adapted from the Lost in the Mix (LIM) project:
https://github.com/amr-mohamedd/Lost-in-the-Mix
-
load_raw_train_data.py
Downloads original datasets, including MLQA, MMLU/MMMLU, and XNLI. -
mlqa_mcq_gen.py
Generates distractor (incorrect) answers for MLQA multiple-choice questions. -
training_data_gen.py
Wrapper script for generating CoS training data from the original datasets. -
training_cos_gen.py
Implements LIM-CoS and T-CoS generation procedures. -
async_llama_query.py
Provides concurrent querying support for large language models.
This directory contains scripts for generating evaluation and benchmark datasets for CoS experiments.
Some scripts are adapted from the Lost in the Mix (LIM) project:
https://github.com/amr-mohamedd/Lost-in-the-Mix
-
prepare_dataset.py
Downloads original benchmark datasets, including Belebele, MMLU/MMMLU, and XNLI. -
eval_data_gen.py
Wrapper script for generating CoS evaluation data from the original benchmark datasets. -
training_cos_gen.py
Implements LIM-CoS and T-CoS generation procedures for evaluation data. -
async_llama_query.py
Provides concurrent querying support for large language models.
This directory contains code for model training, continual learning configurations, evaluation, and Pareto dominance ranking.
-
qwen_base_head_training.py
Pre-trains the classifier head for the Qwen3-0.6B base model. -
adp_lora_normal_sft_3_save.py
Main training script supporting multiple training configurations, including:- Vanilla fine-tuning
- Pfeiffer adapters
- LoRA adapters
- Raw data replay
- Learning without Forgetting (LwF) knowledge distillation
-
model_eval.py
Model evaluation script. -
eval_utils.py
Evaluation utilities, including accuracy computation and the (deprecated) knowledge entropy metric.