YaPO is a steering algorithm instruction-tuned LLMs toward region- or domain-specific behaviors by learning sparse activation vectors on top of pretrained and frozen LLMs using Sparse AutoEncoders (SAEs).
This repository contains:
| Folder | Description |
|---|---|
src/data_prep/ |
Scripts that regenerate “rejected” answers with the current model to build DPO-style ready datasets. |
src/modeling/yapo/ |
Training, steering, and testing code for sparse/dense vectors (Slurm-friendly). |
src/eval/ |
Evaluation code and adaptation of lm-evaluation-harness plus Slurm wrappers to benchmark baseline and steered models. |
assets/ |
Figures (e.g., method.png above) and sample training logs for documentation. |
installation_*.sh |
Environment bootstrap scripts for AMD (ROCm) and NVIDIA clusters. |
The rest of the README walks through the complete workflow.
Pick the script that matches your hardware; each creates a Conda env with all
dependencies (Torch, accelerate, trl, etc.).
# AMD ROCm (MI210 were used in our experiments)
bash installation_amd.sh yapo
# NVIDIA CUDA
bash installation_nvidia.sh yapo
conda activate yapo # or whichever env name you chose-
Generate rejected answers (data prep).
src/data_prep/generate_rejected.pystreams your prompt dataset through the current policy model, treats its outputs as rejected responses, and pairs them with the trusted “chosen” answers. Use the shell wrapper for clusters:cd src/data_prep sbatch generate_rejected.sh \ --data_path MBZUAI-Paris/Deep-Culture-Lense \ --model-name google/gemma-2-9b-it \ --mcqThe script writes logs to
logs/on_policy/and uploads the augmented dataset to your Hugging Face repo (seesrc/data_prep/README.mdfor details). -
Train steering vectors (modeling).
Launchsbatch src/modeling/yapo/train.shwith the desired mode (--sparsefor YaPO,--densefor BiPO), layer, dataset, and SAE parameters:cd src/modeling/yapo sbatch train.sh \ --mode sparse \ --layer 15 \ --hub_dataset_path MBZUAI-Paris/Deep-Culture-Lense_processed_2b_mcq_mxlen_1024 \ --country_name morocco \ --per_device_train_batch_size 4 \ --learning_rate 5e-4Logs land in
logs/modeling/train/and steering vectors invector/<behavior>_*. -
Generate evaluation traces.
Run./run_all_tests.shto sweep the trained steering vectors (or baselines) and produce.jsonl/.parquetfiles consumed by the eval stage:./run_all_tests.sh \ --mode sparse \ --layer 15 \ --behavior morocco_sparse_2b_15_0.0005_4_20_1_loc-localized \ --vector_dir ./vector/morocco_sparse_2b_15_0.0005_4_20_1_loc-localized_gemma-mcq
-
Evaluate accuracy & general knowledge.
Move intosrc/eval/lm-evaluation-harnessand submit either the baseline script or the steering sweep:cd src/eval/lm-evaluation-harness sbatch run_baseline.slurm \ --export=ALL,BASELINE_MODEL=google/gemma-2-2b-it,TASKS=mmlu,NUM_FEWSHOT=5 sbatch run_mmlu_steering.slurm \ --export=ALL,STEERING_MODE=yapo,STEERING_COUNTRY=morocco,STEERING_MODEL_SIZE=2bResults appear under
runs/, logs underlogs/. The harness README explains all env vars and task settings.
- Data prep README: column schema, Hugging Face upload tips, and Slurm flags.
- Modeling README: clarifies ROCm env variables, steering CLI options, and
troubleshooting notes for
train.sh,test_steering.sh, andrun_all_tests.sh. - Eval README: explains how the vendored
lm-evaluation-harnessis wired into Slurm jobs for baseline vs. steering experiments.
All three documents live in their respective folders. Refer back to them for the complete flag list and failure modes. Do not hesitate to open an issue if any.
- The
assets/directory holds the method figure and sample training curves. - All scripts look for secrets via a local
.env(WANDB, HF tokens). Keep that file out of version control.
- ROCm clusters require the system exports already present in
train.sh(HSA_OVERRIDE_GFX_VERSION,HIP_VISIBLE_DEVICES,LD_PRELOAD). Reuse those lines when writing new job scripts. - WandB is optional; set
WANDB_API_KEYto enable logging or leave it empty to run offline. - Steering configs are named
COUNTRY_MODELSIZE_LAYER_LR_BATCH_EPOCHS_MCQ_loc-STATUS—use the same slug everywhere (training, testing, eval) to avoid mismatches, or make sure to take care of it.
With the environment ready and the steps above, you can reproduce every YaPO result: regenerate data, train the steering vectors, produce test generations, and benchmark them all while reusing the provided scripts.
