Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions research/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@ NVIDIA FLARE has been used in several research studies. In this directory, you c

## Research Implementations

1. [FedNCA - Equitable Federated Learning with NCA](./fed-bpt/README.md) ([MICCAI 2025](https://arxiv.org/abs/2506.21735))
1. [FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models](./fed-bpt/README.md) ([ICML 2024](https://arxiv.org/abs/2310.01467))
2. [ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data](./condist-fl/README.md) ([DeCaF 2023](https://arxiv.org/abs/2308.04070))
3. [Fair Federated Medical Image Segmentation via Client Contribution Estimation](./fed-ce/README.md) ([CVPR 2023](https://arxiv.org/abs/2303.16520))
4. [Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples](./one-shot-vfl/README.md) ([ICCV 2023](https://arxiv.org/abs/2303.16270))
5. [Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation](./fed-sm/README.md) ([CVPR 2022](https://arxiv.org/abs/2203.10144))
6. [Do Gradient Inversion Attacks Make Federated Learning Unsafe?](./quantifying-data-leakage/README.md) ([IEEE Transactions on Medical Imaging 2022](https://arxiv.org/abs/2202.06924))
7. [Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation](./auto-fed-rl/README.md) ([ECCV 2022](https://arxiv.org/abs/2203.06338))
8. [FedBN: Federated Learning on Non-IID Features via Local Batch Normalization](./fed-bn/README.md) ([ICLR 2021](https://arxiv.org/abs/2102.07623))
10. [FedNCA - Equitable Federated Learning with NCA](./fed-bpt/README.md) ([MICCAI 2025](https://arxiv.org/abs/2506.21735))
09. [FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models](./fed-bpt/README.md) ([ICML 2024](https://arxiv.org/abs/2310.01467))
08. [ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data](./condist-fl/README.md) ([DeCaF 2023](https://arxiv.org/abs/2308.04070))
07. [FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning](./fedobd/README.md) ([IJCAI 2023](https://arxiv.org/abs/2208.05174))
06. [Fair Federated Medical Image Segmentation via Client Contribution Estimation](./fed-ce/README.md) ([CVPR 2023](https://arxiv.org/abs/2303.16520))
05. [Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples](./one-shot-vfl/README.md) ([ICCV 2023](https://arxiv.org/abs/2303.16270))
04. [Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation](./fed-sm/README.md) ([CVPR 2022](https://arxiv.org/abs/2203.10144))
03. [Do Gradient Inversion Attacks Make Federated Learning Unsafe?](./quantifying-data-leakage/README.md) ([IEEE Transactions on Medical Imaging 2022](https://arxiv.org/abs/2202.06924))
02. [Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation](./auto-fed-rl/README.md) ([ECCV 2022](https://arxiv.org/abs/2203.06338))
01. [FedBN: Federated Learning on Non-IID Features via Local Batch Normalization](./fed-bn/README.md) ([ICLR 2021](https://arxiv.org/abs/2102.07623))

## Contributing

Expand Down
60 changes: 60 additions & 0 deletions research/fedobd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning

This directory introduces the NVFLARE implementation of the quantization scheme in FedOBD, which is an integer quantization scheme that can greatly reduce the size of transferred messages.

FedOBD was accepted in [IJCAI2023](https://www.ijcai.org/proceedings/2023/0394.pdf), its latest version can be found in [arXiv:2208.05174](https://arxiv.org/abs/2208.05174)

## Abstract:

> Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.

## License

This project is open-sourced under the Apache v2 License.

## Implementation

A quantization scheme called "ADAQUANT" has been added to NVFLARE under **nvflare/app_opt/pt/quantization**, which is based on our [official implementation](https://github.com/cyyever/distributed_learning_simulator).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: README claims ADAQUANT quantization scheme was added to nvflare/app_opt/pt/quantization, but that directory only contains generic quantization code with no "adaquant" mode. The quantizer only supports FLOAT16, BLOCKWISE8, FLOAT4, and NORMFLOAT4. Either the implementation is missing or the claim is inaccurate


## Environment Setup

```bash
# Install NVFLARE and related packages
pip install -r requirements.txt
```

## Steps to run the code

Let's follow the steps in the [quantization examples] (https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: markdown link has space before URL which breaks the link

Suggested change
Let's follow the steps in the [quantization examples] (https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
Let's follow the steps in the [quantization examples](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).


### Data Preparation
cd examples/advanced/llm_hf

mkdir dataset
cd dataset
git clone https://huggingface.co/datasets/tatsu-lab/alpaca
git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
git clone https://huggingface.co/datasets/OpenAssistant/oasst1
cd ..
mkdir dataset/dolly
python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1

### Run ADAQUANT

python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
Comment on lines +30 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: bash commands not wrapped in code block - should use ```bash fencing for proper formatting

Suggested change
### Data Preparation
cd examples/advanced/llm_hf
mkdir dataset
cd dataset
git clone https://huggingface.co/datasets/tatsu-lab/alpaca
git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
git clone https://huggingface.co/datasets/OpenAssistant/oasst1
cd ..
mkdir dataset/dolly
python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
### Run ADAQUANT
python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
### Data Preparation
```bash
cd examples/advanced/llm_hf
mkdir dataset
cd dataset
git clone https://huggingface.co/datasets/tatsu-lab/alpaca
git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
git clone https://huggingface.co/datasets/OpenAssistant/oasst1
cd ..
mkdir dataset/dolly
python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1

Run ADAQUANT

python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: incorrect script name - should be job.py not llm_hf_fl_job.py

Suggested change
python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
python3 job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant


## Citation

If you use this implementation, please cite the original FedOBD paper:

```bibtex
@inproceedings{chen2022fedobd,
title = {{FedOBD}: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning},
author = {Chen, Yuanyuan and Chen, Zichen and Wu, Pengcheng and Yu, Han},
year = 2023,
booktitle = {The 32nd International Joint Conference on Artificial Intelligence},
doi = {10.24963/ijcai.2023/394},
}
```
4 changes: 4 additions & 0 deletions research/fedobd/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
nvflare~=2.5.0rc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.5 won't work, we can remove the version number

torch
torchvision
tensorboard