FlowerTune LLM Benchmark

This directory conducts federated instruction tuning with different pre-trained LLMs on four challenges defined in the FlowerTune LLM Leaderboard: general NLP, finance, medical and code. The experiments in paper "FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models" are conducted using this repository.

We use Flower Datasets to download, partition and preprocess the dataset. Flower's Simulation Engine is used to simulate the LLM fine-tuning process in federated way, which allows users to perform the training on a single GPU.

Environments setup

In each project directory, the dependencies are defined in pyproject.toml. Install them in an activated Python environment with:

cd project_name  # selected from [general_nlp, finance, medical, coding]

pip install -e .
pip install flash-attn --no-build-isolation   # Install FlashAttention-2

Running federated fine-tuning

First make sure that you have got the access to your preferred model with your Hugging-Face account. You can request access directly from the Hugging-Face website. Then, follow the instruction here to log in your account. Note you only need to complete this stage once in your development machine:

huggingface-cli login

Then, login your W&B account if you want to use it for experimental status logging. To disable W&B, set use-wandb = false in pyproject.toml.

wandb login

Run the challenge with default config values. The configs are defined in [tool.flwr.app.config] entry of pyproject.toml, and are loaded automatically.

flwr run

To run a specified experiment:

# Run on Mistral-7B-v0.3 model without wandb
flwr run --run-config "model.name='mistralai/Mistral-7B-v0.3' run-name='customised_name' use-wandb=false"

# Run with FedProx
flwr run --run-config "strategy.name='fedprox'"

# Run with LoRA
flwr run --run-config "model.lora.peft-use-dora=false"

Model saving

The global PEFT model checkpoints are saved every 5 rounds after aggregation on the sever side as default, which can be specified with train.save-every-round under [tool.flwr.app.config] entry in pyproject.toml.

Experimental model checkpoints

The model checkpoints fine-tuned in paper "FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models" can be found here: General-NLP, Finance, Medical, Code.

All experiments were conducted on NVIDIA A100 SXM4 (80 GB) GPUs, except for Mistral-24B models which were trained on NVIDIA H100 NVL GPU (94 GB). Note that the checkpoints can be used for research purpose only.

Running the evaluation

To evaluate the fine-tuned LLMs, please follow the instructions in the FlowerTune Evaluation GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowerTune LLM Benchmark

Environments setup

Running federated fine-tuning

Model saving

Experimental model checkpoints

Running the evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
coding		coding
finance		finance
general_nlp		general_nlp
medical		medical
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

FlowerTune LLM Benchmark

Environments setup

Running federated fine-tuning

Model saving

Experimental model checkpoints

Running the evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages