Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization, IJCAI 2026
This repository provides a training pipeline for CTO, including supervised fine-tuning, semantic model training, and preference optimization. Follow the steps below to set up your environment and train the models.
- 📃 File Structure
- 🔧 Environment Setup
- 📚 Supervised Fine-tuning & Preference Dataset Construction
- 🎯 Semantic Model Training
- 📊 Preference Optimization
├── data/ # Used dataset
│ ├── humanevalx
│ ├── transcoder-test
│ └── xlcost
│ └── reward_data # data for semantic reward model
│ └── preference_dataset # data for preference learning
├── dataset/ # Dataset reader pipeline
│ ├── base_dataset
│ ├── humanevalx.py
│ ├── transcoder
│ ├── xlcost.py
│ └── utils.py
├── env/ # Necessary file for java execution
├── evaluation/ # Evaluation pipeline
│ ├── lang_executor/ # Executor of different programming language
│ ├── temp/ # temp file in evaluation
│ ├── tests/ # tests for lang_executor
├── semantic/ # Module for semantic model training
│ ├── train_reward.sh # Script to train semantic model
│ └── configs # Training configs
├── trainer # Model trainer
├── requirements.txt # dependency file
├── cto_trainer.py # script for preference optimization
├── sft_trainer.py # script for supervised training
├── merge_peft.py # merge lora script
├── run.py # script for evaluation
└── README.md
First, install the required dependencies by running:
pip install -r requirements.txtDownload data here.
To perform supervised fine-tuning, execute the following command:
python sft_trainer.py \
--model_name_or_path codellama/CodeLlama-7b-hf \
--output_dir <YOUR_LORA_OUTPUT_DIR> \
--dataset_path <YOUR_SFT_DATASET_PATH> \
Replace <YOUR_LORA_OUTPUT_DIR> and <YOUR_SFT_DATASET_PATH>(in ./data/xlcost) with the appropriate paths.
Then, merge the lora weight with the original CodeLlama-7B model.
python merge_peft.py \
--adapter_dir <YOUR_LORA_OUTPUT_DIR>/final_checkpoint \
--output_dir <YOUR_SFT_MODEL_OUTPUT_DIR> \
- Navigate to the
semanticdirectory.
We provided the train file in reward_data.zip.
Then, train the semantic model by running:
bash ./train_reward.shRun the following command to preference optimization:
python cto_trainer.py \
--model_path <YOUR_SFT_MODEL_OUTPUT_DIR>/final_checkpoint \
--src_lang java \
--tgt_lang cpp \
--output_dir <CTO_LORA_CHECKPOINT> \
--preference_dataset_file <PREFERENCE_FILE>
Replace the placeholders with the actual paths to your models and dataset.
We provide the preference dataset in data/preference_dataset.zip.
Then merge the lora checkpoint:
python merge_peft.py \
--adapter_dir <CTO_LORA_CHECKPOINT>\
--output_dir <YOUR_CTO_MODEL_OUTPUT_DIR> \
To evaluate the CA@1 of model, you can run:
python run.py \
--model_path <YOUR_CTO_MODEL_OUTPUT_DIR>/final_checkpoint \
--dataset_name transcoder \
--src_lang java \
--tgt_lang cpp \
--sample_k 1 \
--save_path <TRANSLATION_JSON_FILE>
dataset_name in transcoder or humanevalx, src_lang and tgt_lang in java, cpp, python.
@inproceedings{cto2026, title = {Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization}, booktitle = {IJCAI}, year = {2026} }