FLAIR-THU · xujiangyu · Jan 27, 2024 · Jan 29, 2024 · Jan 31, 2024 · Feb 7, 2024
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,8 @@
+__pycache__
+wandb/
+*.log
+*-best_model.pt
+*-last_model.pt
+coco_subset_idx_*
+*.bbl
+*.synctex.gz
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,52 @@
+{
+    "cSpell.words": [
+        "adamp",
+        "allimages",
+        "batchidx",
+        "Bstdv",
+        "cifa",
+        "cifar",
+        "CIFAR",
+        "crossfold",
+        "crossfolds",
+        "cudnn",
+        "CVPR",
+        "dset",
+        "flickr",
+        "gpuid",
+        "Graphcore",
+        "idxs",
+        "imagenet",
+        "inferencing",
+        "interintra",
+        "Jingjing",
+        "keepdim",
+        "mmdata",
+        "MMFL",
+        "MSCOCO",
+        "multimodal",
+        "Multimodal",
+        "multistep",
+        "noniid",
+        "optim",
+        "PCME",
+        "pycocotools",
+        "Qiying",
+        "rsum",
+        "subconfig",
+        "svhn",
+        "testclasses",
+        "tqdm",
+        "trainclasses",
+        "trainloader",
+        "trainloaders",
+        "trainval",
+        "trainvalclasses",
+        "ujson",
+        "unsqueeze",
+        "valclasses",
+        "vocabs",
+        "wandb",
+        "Yimu"
+    ]
+}
diff --git a/README.md b/README.md
@@ -1,3 +1,83 @@
+# Networking for CreamFL
+
+## Tasks
+
+* get code base running locally.
+  * figure out how to run though the entire code quickly.
+  * quick test run: `python src/main.py --name quick --contrast_local_inter --contrast_local_intra --interintra_weight 0.5 --max_size 64 --pub_data_num 2 --feature_dim 2 --num_img_clients 2 --num_txt_clients 2 --num_mm_clients 3 --client_num_per_round 2 --local_epochs 2 --comm_rounds 2 --not_bert`
+    * `--contrast_local_inter --contrast_local_intra --interintra_weight 0.5` Cream options.
+    * `--max_size` added by xiegeo, 0 or 10000 for old behavior, client training data count, per client.
+    * `--pub_data_num` public training data size (default 50000), proportional to communication cost (memory for local simulation) cost.
+    * `--feature_dim` number of public features (default 256), proportional to communication cost.
+    * `--num_img_clients 2 --num_txt_clients 2 --num_mm_clients 3 --client_num_per_round 2` number of max client of each type, and total number of client per round.
+    * `--local_epochs 2 --comm_rounds 2` local and global rounds.
+    * `--not_bert` use a simpler model
+
+* get code to run in a network
+  * see the "How to run the network" section.
+
+## Goals
+
+* Learn:
+  * Transformers
+    * Transformer [Attention Is All You Need 2017 v7(2023)](https://arxiv.org/abs/1706.03762)
+    * [An Introduction to Transformers 2023 v5(2024)](https://arxiv.org/abs/2304.10557)
+  * Multimodal
+    * [DeViSE: A Deep Visual-Semantic Embedding Model 2013](https://research.google.com/pubs/archive/41473.pdf)
+    * PCME [Probabilistic Embeddings for Cross-Modal Retrieval 2021 v2](https://arxiv.org/abs/2101.05068) <https://github.com/naver-ai/pcme>
+  * Federated Learning
+    * Federated Averaging [Communication-Efficient Learning of Deep Networks from Decentralized Data 2016 v4(2023)](https://arxiv.org/abs/1602.05629)
+
+* Implement networking
+  * try FedML? (to much rewrite for fedML to do it properly, otherwise too hacky.)
+  * try custom network? (do a quick demo version)
+
+## How to run the network
+
+### Configuration
+
+* flags: the same as local version.
+* fed_config: setup server and client options.
+
+### Run
+
+A network requires n + 2 processes. Where n is the number of clients,
+plus a command server over http, and a global round computation provider.
+
+#### Command server
+
+```bash
+python src/federation/server.py --name test
+```
+
+#### Global round computation provider
+
+```bash
+python src/federation/global.py --name test --contrast_local_inter --contrast_local_intra --interintra_weight 0.5 --max_size 64 --pub_data_num 2 --feature_dim 2 --not_bert
+```
+
+#### Clients
+
+Replace txt0 with the client to start.
+
+```bash
+python src/federation/client.py --name test --client_name txt_0 --max_size 64 --pub_data_num 2 --feature_dim 2 --not_bert
+```
+
+### File sharing
+
+The network has to share the learned features. This could be through a file server,
+a CDN, or shared network storage, ex.  Directly accessing the same files is the
+easies to implement and easily extends to shared network storage, so this is implemented
+first for ease of local testing without lose of generality.
+
+## Prove of Concept
+
+see [report/poc.pdf](report/poc.pdf)
+
+------------------------
+Begin original readme
+
 # Multimodal Federated Learning via Contrastive Representation Ensemble
 
 This repo contains a PyTorch implementation of the paper [Multimodal Federated Learning via Contrastive Representation Ensemble](https://arxiv.org/abs/2302.08888) (ICLR 2023). 
@@ -42,6 +122,24 @@ To reproduce CreamFL with BERT and ResNet101 as server models, run the following
 python src/main.py --name CreamFL --server_lr 1e-5 --agg_method con_w --contrast_local_inter --contrast_local_intra --interintra_weight 0.5
 ```
 
+## Run CreamFL retrieval parallely
+[1] Run global server
+```shell
+bash retri_center.sh
+```
+[2] Run txt client
+```shell
+bash client_txt_retri.sh
+```
+[3] Run img client
+```shell
+bash client_img_retri.sh
+```
+[4] Run mm client
+```shell
+bash client_mm_retri.sh
+```
+
 ## Citation
 
 If you find the paper provides some insights into multimodal FL or our code useful 🤗, please consider citing:

diff --git a/client_img_retri.sh b/client_img_retri.sh
@@ -0,0 +1,4 @@
+export HF_ENDPOINT=https://hf-mirror.com
+export HF_DATASETS_CACHE="/shared/.cache/huggingface/datasets"
+
+nohup python src/retri_client_img.py --name retri_client_img --server_lr 1e-5 --seed 0 --feature_dim 256 --pub_data_num 50000 --agg_method con_w --contrast_local_inter --contrast_local_intra --interintra_weight 0.5 --local_epochs 5 --client_num_per_round 1 --num_img_clients 1 --num_txt_clients 0 --num_mm_clients 0 > retri_client_img.log 2>&1 &
diff --git a/client_mm_retri.sh b/client_mm_retri.sh
@@ -0,0 +1,4 @@
+export HF_ENDPOINT=https://hf-mirror.com
+export HF_DATASETS_CACHE="/shared/.cache/huggingface/datasets"
+
+nohup python src/retri_client_mm.py --name retri_client_mm --server_lr 1e-5 --seed 0 --feature_dim 256 --pub_data_num 50000 --agg_method con_w --contrast_local_inter --contrast_local_intra --interintra_weight 0.5 --local_epochs 5 --client_num_per_round 1 --num_img_clients 0 --num_txt_clients 0 --num_mm_clients 1 > retri_client_mm.log 2>&1 &
diff --git a/client_txt_retri.sh b/client_txt_retri.sh
@@ -0,0 +1,4 @@
+export HF_ENDPOINT=https://hf-mirror.com
+export HF_DATASETS_CACHE="/shared/.cache/huggingface/datasets"
+
+nohup python src/retri_client_txt.py --name retri_client_txt --server_lr 1e-5 --seed 0 --feature_dim 256 --pub_data_num 50000 --agg_method con_w --contrast_local_inter --contrast_local_intra --interintra_weight 0.5 --local_epochs 5 --client_num_per_round 1 --num_img_clients 0 --num_txt_clients 1 --num_mm_clients 0 > retri_client_txt.log 2>&1 &
diff --git a/coco_subset_idx_file b/coco_subset_idx_file
diff --git a/data_partition/client_AG_NEWS_noniid.pkl → ...EWS_10_nets_120000_samples_hetero_0.1.pkl b/data_partition/client_AG_NEWS_noniid.pkl → ...EWS_10_nets_120000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_AG_NEWS_1_nets_120000_samples_hetero_0.1.pkl b/data_partition/client_AG_NEWS_1_nets_120000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_AG_NEWS_2_nets_120000_samples_hetero_0.1.pkl b/data_partition/client_AG_NEWS_2_nets_120000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_AG_NEWS_4_nets_120000_samples_hetero_0.1.pkl b/data_partition/client_AG_NEWS_4_nets_120000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_cifar100_10_nets_50000_samples_hetero_0.1.pkl b/data_partition/client_cifar100_10_nets_50000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_cifar100_1_nets_50000_samples_hetero_0.1.pkl b/data_partition/client_cifar100_1_nets_50000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_cifar100_2_nets_50000_samples_hetero_0.1.pkl b/data_partition/client_cifar100_2_nets_50000_samples_hetero_0.1.pkl
diff --git a/data_partition/client_cifar100_noniid.pkl b/data_partition/client_cifar100_noniid.pkl
diff --git a/fed_config.yaml b/fed_config.yaml
@@ -0,0 +1,45 @@
+wandb:
+  name: "cream_api"
+
+feature_store: "/tmp/cream_api" # the path to the feature store where client and global features are shared.
+
+# server configuration
+server:
+  api_url: "http://localhost:2323/cream_api"
+  min_clients: 3 # the number of required clients reporting to start global distillation.
+  max_clients: 3 # the number of clients reached to start global distillation immediately.
+  wait_duration: 10m # the duration to wait for clients to report before starting global distillation.
+
+
+# clients configuration
+clients:
+  - name: "txt_0"
+    data_type: txt # img, txt, or mm: the type of data the client is handling.
+    local_epochs: 5
+    data_partition: "client_AG_NEWS_10_nets_120000_samples_hetero_0.1.pkl" 
+    data_partition_index: 0 # This is only for testing purposes. In a real-world scenario, the data will not be loaded from the same dataset
+  - name: "txt_1"
+    data_type: txt # img, txt, or mm: the type of data the client is handling.
+    local_epochs: 5
+    data_partition: "client_AG_NEWS_10_nets_120000_samples_hetero_0.1.pkl" 
+    data_partition_index: 1 # This is only for testing purposes. In a real-world scenario, the data will not be loaded from the same dataset
+  - name: "img_0"
+    data_type: img # img, txt, or mm: the type of data the client is handling.
+    local_epochs: 5
+    data_partition_index: 0 # This is only for testing purposes. In a real-world scenario, the data will not be loaded from the same dataset
+  - name: "img_1"
+    data_type: img # img, txt, or mm: the type of data the client is handling.
+    local_epochs: 5
+    data_partition_index: 1 # This is only for testing purposes. In a real-world scenario, the data will not be loaded from the same dataset
+  - name: "mm_0"
+    data_type: mm
+    local_epochs: 5
+    data_partition_index: 0
+  - name: "mm_1"
+    data_type: mm
+    local_epochs: 5
+    data_partition_index: 1
+  - name: "mm_2"
+    data_type: mm
+    local_epochs: 5
+    data_partition_index: 2