|
1 | | -# Hierarchical Adapter Pipeline |
| 1 | +# CoLA: Hierarchical Asymmetric Adapter Pipeline |
2 | 2 |
|
3 | | -Built by Yven & Sashreek and Joel to explore multilingual hierarchical CoLA/Hydra adapters with hierarchical (language-aware) routing |
| 3 | +This folder contains our hierarchical multilingual adapter work based on CoLA and HydraLoRA, including language-aware routing for massively multilingual training. |
| 4 | + |
| 5 | +## What This Approach Investigates |
| 6 | + |
| 7 | +- Hierarchical asymmetric adapters (shared low-rank structure + language-specific components) |
| 8 | +- CoLA and HydraLoRA routing variants |
| 9 | +- Language Prior Routing (LPR) with language-id guidance |
| 10 | +- Multilingual ablations across 200 languages |
| 11 | + |
| 12 | +## Folder Guide |
| 13 | + |
| 14 | +- `LLaMA-Factory/`: training framework with CoLA/HydraLoRA implementation changes |
| 15 | +- `data_prep/`: data sampling, clustering, tokenizer extension, and tokenization pipeline |
| 16 | +- `scripts/`: SLURM/local launchers for training and evaluation |
| 17 | +- `configs/`: evaluation task lists and run configuration inputs |
| 18 | +- `tools/two_stage_clustering/`: language grouping JSONs |
| 19 | +- `docs/`: full documentation and generated PDF/MD builds |
| 20 | +- `result_analysis/`: evaluation exports and analysis scripts |
4 | 21 |
|
5 | 22 | ## Setup |
6 | | -1. **Conda env**: `cd LLaMA-Factory && conda env create -f environment.yaml && conda activate cola_llama_factory`. |
7 | | -2. **Local installs**: Afterwards uninstall peft and llamafactory again and `pip install -e .` (inside `LLaMA-Factory` and inside of `peft`). |
8 | | -3. **Models/data**: we use llama3.1B as well as llama3.2-1B / 3B. We have prepared/tokenized datasets referenced in the scripts (e.g. `/scratch/.../tokenized/...`). check on cluster for details |
9 | 23 |
|
10 | | -## Hierarchical Adapter TL;DR |
11 | | -CoLA/Hydra layers now share family-level A matrices, keep B/heads per language, and optionally use Language Prior Routing (bias/hard routing driven by batch-level language IDs + auxiliary loss). See `docs/storyline.md` for the full narrative and implementation details. |
| 24 | +1. Initialize submodules from the repository root: |
| 25 | +```bash |
| 26 | +git submodule update --init --recursive |
| 27 | +``` |
| 28 | + |
| 29 | +2. Create and activate the conda environment: |
| 30 | +```bash |
| 31 | +cd approaches/CoLA/LLaMA-Factory |
| 32 | +conda env create -f environment.yml |
| 33 | +conda activate merlin |
| 34 | +``` |
12 | 35 |
|
13 | | -## Running (Slurm) |
14 | | -All launchers live in `scripts/`. For example, to train the standard Accelerate MoE CoLA baseline on the cluster: |
| 36 | +3. Install LLaMA-Factory in editable mode so local CoLA/Hydra changes are used: |
| 37 | +```bash |
| 38 | +pip uninstall -y peft llamafactory |
| 39 | +pip install -e . |
15 | 40 | ``` |
| 41 | + |
| 42 | + follow `approaches/CoLA/LLaMA-Factory/setup_conda_env.md` for more details |
| 43 | + |
| 44 | +4. Check model/data paths and environment variables in the SLURM scripts before launching (cluster-specific `/scratch/...` paths are referenced in several scripts). |
| 45 | + |
| 46 | +## Running |
| 47 | + |
| 48 | +From `approaches/CoLA/`: |
| 49 | + |
| 50 | +- Baseline CoLA training: |
| 51 | +```bash |
16 | 52 | cd scripts |
17 | 53 | sbatch accelerate_moe_cola_train.sh |
18 | 54 | ``` |
19 | | -Languag Prior routing is work in progress. |
20 | | -This should also be extended TODO |
| 55 | + |
| 56 | +- Multilingual ablation launcher: |
| 57 | +```bash |
| 58 | +cd scripts/comparison |
| 59 | +sbatch run_multilingual_ablation.sh |
| 60 | +``` |
| 61 | + |
| 62 | +- Single-variant comparison jobs: |
| 63 | + - `scripts/comparison/cola_lpr_job.sh` |
| 64 | + - `scripts/comparison/hydralora_lpr_job.sh` |
| 65 | + - `scripts/comparison/lora_job.sh` |
| 66 | + |
| 67 | +## Documentation (Start Here) |
| 68 | + |
| 69 | +- `docs/README.md` |
| 70 | +- `docs/01_project_documentation.md` |
| 71 | +- `docs/02_data_preparation.md` |
| 72 | +- `docs/03_model_training_and_implementation.md` |
| 73 | +- `docs/04_training_orchestration.md` |
| 74 | +- `docs/05_evaluation_and_analysis.md` |
| 75 | +- `docs/06_reproducibility_and_submission.md` |
| 76 | + |
| 77 | +Deep-dive references: |
| 78 | + |
| 79 | +- `docs/extra/hierarchical_adapters_multilingual_study_approaches_explanation.md` |
| 80 | +- `docs/extra/storyline.md` |
0 commit comments