Skip to content

Commit beb6900

Browse files
committed
Adapted ReadMe, added presentation
1 parent 22e8396 commit beb6900

3 files changed

Lines changed: 140 additions & 11 deletions

File tree

README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# HTYLLM-PG: How To Train Your LLM
2+
3+
This repository is dedicated to developing and investigating language modeling techniques for massively multilingual language models with 200 or more languages.
4+
5+
The work was carried out by a project group from the DICE group at the University of Paderborn over two semesters: SS25 and WS25/26.
6+
7+
## Team
8+
9+
- Tutors: Nikit Srivastava, Rene Speck
10+
- Supervisor: Prof. Dr. Axel Ngonga
11+
- Participants: Yven de Buhr, Jamil Mounzer, Joel Dag, Sashreek Nayak Dhaimodkar, Luke Friedrichs, Martin Schröder
12+
13+
## Project Scope
14+
15+
During the project group, we investigated different approaches and explored their feasibility for training scalable and extensible massively multilingual LLMs.
16+
17+
### Approaches explored in SS25
18+
19+
- Mixture-of-Experts (MoE) training: sparse expert routing to scale model capacity efficiently.
20+
- Cross-lingual transfer fine-tuning: training related languages together to improve transfer between them.
21+
- Joint multilingual pretraining: training a dense model across all target languages.
22+
- Adapter techniques (LoRA, QLoRA, XLoRA): parameter-efficient adaptation for multilingual settings.
23+
24+
### Focus in WS25/26
25+
26+
- Asymmetric hierarchical LoRA adapters: hierarchical adapter-based methods with shared and language-specific components.
27+
- Dynamic MoE: adaptive expert routing variants and large-scale multilingual models.
28+
29+
## Where To Find The Approaches
30+
31+
- `approaches/CoLA`: hierarchical multilingual adapter line (CoLA/HydraLoRA and language-aware routing).
32+
- `approaches/adapter`: adapter-centered experiments and setup notes.
33+
- `approaches/cross_lingual_transfer`: cross-lingual transfer fine-tuning and evaluation scripts.
34+
- `approaches/joint_multilingual_pretraining`: dense multilingual pretraining pipeline.
35+
- `approaches/moe`: MoE pretraining pipeline and scripts.
36+
- `approaches/dynamic_moe`: dynamic MoE with DeepSpeed, conversion, and evaluation tooling.
37+
38+
## How To Get Started With The Different Approaches
39+
40+
For the hierarchical asymmetric LoRA approach (in `approaches/CoLA`), there is extensive documentation of the intermediate steps across data sampling, preparation, training, and evaluation in `approaches/CoLA/docs/`.
41+
42+
Additionally, you can use the presentations in `presentations/` to understand the overall approaches and project progress.
43+
44+
Most explored approaches, especially the ones we focused on in WS25/26, have their own README files to support onboarding.
45+
46+
For CoLA:
47+
- `approaches/CoLA/README.md`
48+
- `approaches/CoLA/docs/01_project_documentation.md`
49+
50+
For dynamic MoE:
51+
- `approaches/dynamic_moe/README.md`
52+
53+
For the other approaches:
54+
- `approaches/adapter/README.md`
55+
- `approaches/cross_lingual_transfer/readme.md`
56+
- `approaches/joint_multilingual_pretraining/readme.md`
57+
- `approaches/moe/readme.md`
58+
59+
60+
### Project Management And Shared Resources
61+
62+
We organized and tracked work mainly via GitHub issues/milestones and a Kanban board:
63+
64+
- GitHub issues: https://github.com/dice-group/HTYLLM-PG/issues
65+
- Kanban board: https://kanboard.cs.uni-paderborn.de/?controller=BoardViewController&action=show&project_id=850&search=status%3Aopen
66+
67+
We also maintained a shared Sciebo folder for plots, documentation, literature review material, and approach-specific resources such as important training/evaluation datasets:
68+
69+
- Sciebo folder: https://uni-paderborn.sciebo.de/apps/files/files/850613857?dir=/HTYLLM-PG

approaches/CoLA/README.md

Lines changed: 71 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,80 @@
1-
# Hierarchical Adapter Pipeline
1+
# CoLA: Hierarchical Asymmetric Adapter Pipeline
22

3-
Built by Yven & Sashreek and Joel to explore multilingual hierarchical CoLA/Hydra adapters with hierarchical (language-aware) routing
3+
This folder contains our hierarchical multilingual adapter work based on CoLA and HydraLoRA, including language-aware routing for massively multilingual training.
4+
5+
## What This Approach Investigates
6+
7+
- Hierarchical asymmetric adapters (shared low-rank structure + language-specific components)
8+
- CoLA and HydraLoRA routing variants
9+
- Language Prior Routing (LPR) with language-id guidance
10+
- Multilingual ablations across 200 languages
11+
12+
## Folder Guide
13+
14+
- `LLaMA-Factory/`: training framework with CoLA/HydraLoRA implementation changes
15+
- `data_prep/`: data sampling, clustering, tokenizer extension, and tokenization pipeline
16+
- `scripts/`: SLURM/local launchers for training and evaluation
17+
- `configs/`: evaluation task lists and run configuration inputs
18+
- `tools/two_stage_clustering/`: language grouping JSONs
19+
- `docs/`: full documentation and generated PDF/MD builds
20+
- `result_analysis/`: evaluation exports and analysis scripts
421

522
## Setup
6-
1. **Conda env**: `cd LLaMA-Factory && conda env create -f environment.yaml && conda activate cola_llama_factory`.
7-
2. **Local installs**: Afterwards uninstall peft and llamafactory again and `pip install -e .` (inside `LLaMA-Factory` and inside of `peft`).
8-
3. **Models/data**: we use llama3.1B as well as llama3.2-1B / 3B. We have prepared/tokenized datasets referenced in the scripts (e.g. `/scratch/.../tokenized/...`). check on cluster for details
923

10-
## Hierarchical Adapter TL;DR
11-
CoLA/Hydra layers now share family-level A matrices, keep B/heads per language, and optionally use Language Prior Routing (bias/hard routing driven by batch-level language IDs + auxiliary loss). See `docs/storyline.md` for the full narrative and implementation details.
24+
1. Initialize submodules from the repository root:
25+
```bash
26+
git submodule update --init --recursive
27+
```
28+
29+
2. Create and activate the conda environment:
30+
```bash
31+
cd approaches/CoLA/LLaMA-Factory
32+
conda env create -f environment.yml
33+
conda activate merlin
34+
```
1235

13-
## Running (Slurm)
14-
All launchers live in `scripts/`. For example, to train the standard Accelerate MoE CoLA baseline on the cluster:
36+
3. Install LLaMA-Factory in editable mode so local CoLA/Hydra changes are used:
37+
```bash
38+
pip uninstall -y peft llamafactory
39+
pip install -e .
1540
```
41+
42+
follow `approaches/CoLA/LLaMA-Factory/setup_conda_env.md` for more details
43+
44+
4. Check model/data paths and environment variables in the SLURM scripts before launching (cluster-specific `/scratch/...` paths are referenced in several scripts).
45+
46+
## Running
47+
48+
From `approaches/CoLA/`:
49+
50+
- Baseline CoLA training:
51+
```bash
1652
cd scripts
1753
sbatch accelerate_moe_cola_train.sh
1854
```
19-
Languag Prior routing is work in progress.
20-
This should also be extended TODO
55+
56+
- Multilingual ablation launcher:
57+
```bash
58+
cd scripts/comparison
59+
sbatch run_multilingual_ablation.sh
60+
```
61+
62+
- Single-variant comparison jobs:
63+
- `scripts/comparison/cola_lpr_job.sh`
64+
- `scripts/comparison/hydralora_lpr_job.sh`
65+
- `scripts/comparison/lora_job.sh`
66+
67+
## Documentation (Start Here)
68+
69+
- `docs/README.md`
70+
- `docs/01_project_documentation.md`
71+
- `docs/02_data_preparation.md`
72+
- `docs/03_model_training_and_implementation.md`
73+
- `docs/04_training_orchestration.md`
74+
- `docs/05_evaluation_and_analysis.md`
75+
- `docs/06_reproducibility_and_submission.md`
76+
77+
Deep-dive references:
78+
79+
- `docs/extra/hierarchical_adapters_multilingual_study_approaches_explanation.md`
80+
- `docs/extra/storyline.md`
2.31 MB
Binary file not shown.

0 commit comments

Comments
 (0)