Reasoning Transfer for an Extremely Low-resource and Endangered Language: Bridging Languages through Sample-Efficient Language Understanding
Khanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen
Accepted to AAAI-26
This repository contains code and resources for the paper Reasoning Alignment for an Extremely Low-resource and Endangered Language: Separating Reasoning and Language Understanding.
./data: evaluation data, including our contributed dataset LC2024../src/train: training code, including for the baseline Native-CoT Training and English-Pivoted CoT Training../src/evaluation: evaluation code, based on the SkyThought repo../appendix.pdf: technical appendix.
-
Install dependencies:
cd ./src/train pip install -r requirements.txt -
Run the main script:
bash scripts/lang_adapt/run_sft.sh # change the paths accordingly
-
Install dependencies:
cd ./src/evaluation pip install -r requirements.txt -
Run evaluation:
cd ./src/evaluation/skythought/skythought_evals python eval.py --model ${YOUR_MODEL_HERE}$ --evals=aime,irish_aime,LC2024 --tp=1 --output_file=results.txt --temperatures 0.6 --n 64
For more information, refer to the original document from the SkyThought repo.
TBU