In the following, we show an example of running MONAI-bundle configurations with NVFlare.
This example includes instructions on running FedAvg and homomorphic encryption for secure aggregation. It uses the provisioning and the admin API to submit jobs, similar to how one would set up experiments in real-world deployment.
In this example, we use an already prepared provisioning file (project.yml) to run experiments on a single machine. For real-world deployment, additional considerations must be taken into account. See here for more information.
For an example to get started with FL simulator, see here.
To execute the below commands, please open a terminal. And go to the folder containing this tutorial
To execute the below commands, please open a terminal and go to the folder containing this tutorial.
We recommend following the instructions for setting up a virtual environment, and using it in JupyterLab for running the notebooks the MONAI integration examples.
Download the MONAI bundle as ./${JOB_NAME}/app/config/spleen_ct_segmentation.
JOB_NAME=job
python3 -m monai.bundle download --name "spleen_ct_segmentation" --version "0.3.7" --bundle_dir ./${JOB_NAME}/app/config
In this example, JOB_NAME can be either job or job_he, depending on the configuration you would like to run (see below).
The final folder structure under JOB_NAME will be:
.
├── app
│ └── config
│ ├── config_fed_client.json
│ ├── config_fed_server.json
│ └── spleen_ct_segmentation
│ ├── LICENSE
│ ├── configs
│ │ ├── evaluate.json
│ │ ├── inference.json
│ │ ├── logging.conf
│ │ ├── metadata.json
│ │ ├── multi_gpu_evaluate.json
│ │ ├── multi_gpu_train.json
│ │ └── train.json
│ ├── docs
│ │ ├── README.md
│ │ └── data_license.txt
│ └── models
│ ├── model.pt
│ └── model.ts
└── meta.json
Download the spleen CT data from the MSD challenge and update data path.
Note: The dataset will be saved under
./data.
JOB_NAME=job
python3 download_spleen_dataset.py
sed -i "s|/workspace/data/Task09_Spleen|${PWD}/data/Task09_Spleen|g" ${JOB_NAME}/app/config/spleen_ct_segmentation/configs/train.json
The next scripts will start the FL server and 2 clients automatically to run FL experiments on localhost.
The project file for creating the secure workspace used in this example is shown at ./workspaces/secure_project.yml.
If you want to run the homomorphic encryption job, please install TenSEAL:
pip install tenseal
(this example was tested with tenseal==0.3.12)
Otherwise, please remove the HEBuilder section from workspaces/secure_project.yml.
To create the secure workspace, please use the following to build a package and copy it
to secure_workspace for later experimentation.
cd ./workspaces
nvflare provision -p ./secure_project.yml
cp -r ./workspace/secure_project/prod_00 ./secure_workspace
cd ..
For more information about secure provisioning see the documentation.
For starting the FL system with 2 clients in the secure workspace, run
./start_fl_secure.sh 2
To run FL experiments in POC mode, create your local FL workspace the below command. In the following experiments, we will be using 2 clients. Press y and enter when prompted.
nvflare poc --prepare -n 2
By default, POC will create startup kits at /tmp/nvflare/poc.
NOTE: POC stands for "proof of concept" and is used for quick experimentation with different amounts of clients. It doesn't need any advanced configurations while provisioning the startup kits for the server and clients.
The secure workspace on the other hand is needed to run experiments that require encryption keys such as the homomorphic encryption (HE) one shown below. These startup kits allow secure deployment of FL in real-world scenarios using SSL certificated communication channels.
Then, start the FL system with all provisioned clients by running
nvflare poc --start
Here, we assume jobs are submitted and run one at a time. For details about resource management and consumption, please refer to the documentation.
Note: Full FL training could take several hours for this task. To speed up your experimentation, you can reduce the
num_roundsvalue inconfig_fed_server.json, e.g. to 5 rounds.
To run FedAvg using a real-world setup, submit the job using:
./submit_job.sh job
(Optional) In POC mode, use
./submit_job.sh job --poc
NOTE: You can always use the admin console to manually abort a running job. using
abort_job [JOB_ID]. For a complete list of admin commands, see here.To log into the POC workspace admin console no username is required (use "admin" for commands requiring conformation with username).
For the secure workspace admin console, use username "admin@nvidia.com"
After training, each client's best model will be used for cross-site validation. The results can be downloaded and shown with the admin console using
download_job [JOB_ID]
where [JOB_ID] is the ID assigned by the system when submitting the job.
You can use the list_jobs admin command to find the relevant JOB_ID.
The result will be downloaded to your admin workspace (the exact download path will be displayed when running the command). You should see the cross-site validation results at
[DOWNLOAD_DIR]/[JOB_ID]/workspace/cross_site_val/cross_val_results.json
Next we run FedAvg using homomorphic encryption (HE) for secure aggregation on the server.
NOTE: For HE, we need to use the securely provisioned workspace. It will also take longer due to the additional encryption, decryption, encrypted aggregation, and increased encrypted messages sizes involved.
Follow the steps above for downloading the bundle and setting the data using JOB_NAME=job_he.
Then, submit the job to run FedAvg with HE:
./submit_job.sh job_he