NVIDIA · cyyever · Sep 7, 2025 · Sep 17, 2025 · Oct 31, 2025 · Nov 17, 2025
diff --git a/research/README.md b/research/README.md
@@ -7,13 +7,14 @@ NVIDIA FLARE has been used in several research studies. In this directory, you c
 ## Research Implementations
 
 1. [FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models](./fed-bpt/README.md) [ICML 2024](https://arxiv.org/abs/2310.01467)
-2. [ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data](./condist-fl/README.md) ([DeCaF 2023](https://arxiv.org/abs/2308.04070))
-3. [Fair Federated Medical Image Segmentation via Client Contribution Estimation](./fed-ce/README.md) ([CVPR 2023](https://arxiv.org/abs/2303.16520))
-4. [Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples](./one-shot-vfl/README.md) [ICCV 2023](https://arxiv.org/abs/2303.16270)
-5. [Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation](./fed-sm/README.md) ([CVPR 2022](https://arxiv.org/abs/2203.10144))
-6. [Do Gradient Inversion Attacks Make Federated Learning Unsafe?](./quantifying-data-leakage/README.md) ([IEEE Transactions on Medical Imaging 2022](https://arxiv.org/abs/2202.06924))
-7. [Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation](./auto-fed-rl/README.md) ([ECCV 2022](https://arxiv.org/abs/2203.06338))
-8. [FedBN: Federated Learning on Non-IID Features via Local Batch Normalization](./fed-bn/README.md) [ICLR 2021](https://arxiv.org/abs/2102.07623)
+2. [FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning](./fedobd/README.md) ([IJCAI 2023](https://arxiv.org/abs/2208.05174))
+3. [ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data](./condist-fl/README.md) ([DeCaF 2023](https://arxiv.org/abs/2308.04070))
+4. [Fair Federated Medical Image Segmentation via Client Contribution Estimation](./fed-ce/README.md) ([CVPR 2023](https://arxiv.org/abs/2303.16520))
+5. [Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples](./one-shot-vfl/README.md) [ICCV 2023](https://arxiv.org/abs/2303.16270)
+7. [Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation](./fed-sm/README.md) ([CVPR 2022](https://arxiv.org/abs/2203.10144))
+8. [Do Gradient Inversion Attacks Make Federated Learning Unsafe?](./quantifying-data-leakage/README.md) ([IEEE Transactions on Medical Imaging 2022](https://arxiv.org/abs/2202.06924))
+9. [Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation](./auto-fed-rl/README.md) ([ECCV 2022](https://arxiv.org/abs/2203.06338))
+10. [FedBN: Federated Learning on Non-IID Features via Local Batch Normalization](./fed-bn/README.md) [ICLR 2021](https://arxiv.org/abs/2102.07623)
 
 ## Contributing
 

diff --git a/research/fedobd/README.md b/research/fedobd/README.md
@@ -0,0 +1,60 @@
+# FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning
+
+This directory introduces the NVFLARE implementation of the quantization scheme in FedOBD, which is an integer quantization scheme that can greatly reduce the size of transferred messages.
+
+FedOBD was accepted in [IJCAI2023](https://www.ijcai.org/proceedings/2023/0394.pdf), its latest version can be found in [arXiv:2208.05174](https://arxiv.org/abs/2208.05174)
+
+## Abstract:
+
+> Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against four state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 88% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.
+
+## License
+
+This project is open-sourced under the Apache v2 License.
+
+## Implementation
+
+A quantization scheme called "ADAQUANT" has been added to NVFLARE under **nvflare/app_opt/pt/quantization**, which is based on our [official implementation](https://github.com/cyyever/distributed_learning_simulator).
+
+## Environment Setup
+
+```bash
+# Install NVFLARE and related packages
+pip install -r requirements.txt
+```
+
+## Steps to run the code
+
+Let's follow the steps in the [quantization examples] (https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
-Let's follow the steps in the [quantization examples] (https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
+Let's follow the steps in the [quantization examples](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
-Let's follow the steps in the [quantization examples] (https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
+Let's follow the steps in the [quantization examples](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/llm_hf).
+
+### Data Preparation
+cd examples/advanced/llm_hf
+
+mkdir dataset
+cd dataset
+git clone https://huggingface.co/datasets/tatsu-lab/alpaca
+git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
+git clone https://huggingface.co/datasets/OpenAssistant/oasst1
+cd ..
+mkdir dataset/dolly
+python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
+python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
+python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
+
+### Run ADAQUANT
+
+python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
-### Data Preparation
-cd examples/advanced/llm_hf
-
-mkdir dataset
-cd dataset
-git clone https://huggingface.co/datasets/tatsu-lab/alpaca
-git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
-git clone https://huggingface.co/datasets/OpenAssistant/oasst1
-cd ..
-mkdir dataset/dolly
-python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
-python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
-python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
-
-### Run ADAQUANT
-
-python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
+### Data Preparation
+```bash
+cd examples/advanced/llm_hf
+
+mkdir dataset
+cd dataset
+git clone https://huggingface.co/datasets/tatsu-lab/alpaca
+git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
+git clone https://huggingface.co/datasets/OpenAssistant/oasst1
+cd ..
+mkdir dataset/dolly
+python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
+python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
+python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
-python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
+python3 job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
-### Data Preparation
-cd examples/advanced/llm_hf
-
-mkdir dataset
-cd dataset
-git clone https://huggingface.co/datasets/tatsu-lab/alpaca
-git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
-git clone https://huggingface.co/datasets/OpenAssistant/oasst1
-cd ..
-mkdir dataset/dolly
-python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
-python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
-python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
-
-### Run ADAQUANT
-
-python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
+### Data Preparation
+```bash
+cd examples/advanced/llm_hf
+
+mkdir dataset
+cd dataset
+git clone https://huggingface.co/datasets/tatsu-lab/alpaca
+git clone https://huggingface.co/datasets/databricks/databricks-dolly-15k
+git clone https://huggingface.co/datasets/OpenAssistant/oasst1
+cd ..
+mkdir dataset/dolly
+python ./utils/preprocess_dolly.py --training_file dataset/databricks-dolly-15k/databricks-dolly-15k.jsonl --output_dir dataset/dolly
+python ./utils/preprocess_alpaca.py --training_file dataset/alpaca/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet --output_dir dataset/alpaca
+python ./utils/preprocess_oasst1.py --training_file dataset/oasst1/data/train-00000-of-00001-b42a775f407cee45.parquet --validation_file dataset/oasst1/data/validation-00000-of-00001-134b8fd0c89408b6.parquet --output_dir dataset/oasst1
-python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
+python3 job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_adaquant --job_dir ${PWD}/workspace/jobs/hf_sft_adaquant --train_mode SFT --quantize_mode adaquant
+
+## Citation
+
+If you use this implementation, please cite the original FedOBD paper:
+
+```bibtex
+@inproceedings{chen2022fedobd,
+    title         = {{FedOBD}: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning},
+    author        = {Chen, Yuanyuan and Chen, Zichen and Wu, Pengcheng and Yu, Han},
+    year          = 2023,
+    booktitle     = {The 32nd International Joint Conference on Artificial Intelligence},
+    doi           = {10.24963/ijcai.2023/394},
+}
+```
diff --git a/research/fedobd/requirements.txt b/research/fedobd/requirements.txt
@@ -0,0 +1,4 @@
+nvflare~=2.5.0rc
+torch
+torchvision
+tensorboard