You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a more detailed discription of the documentation follow this [link](https://docs.google.com/document/d/1DaWOX27c4_4_VUT-l_UrgUV-zFa8UsIZ5zUv06pgc0s/edit?usp=sharing)
7
+
For a more detailed discription of the documentation follow this [link](https://docs.google.com/document/d/1DaWOX27c4_4_VUT-l_UrgUV-zFa8UsIZ5zUv06pgc0s/edit?usp=sharing)or check the wiki.
8
8
9
9
10
10
Repository structure:
@@ -65,8 +65,46 @@ How to run program and connect Libtorch:
65
65
66
66
Running SplitPipe in a distributed manner:
67
67
68
-
- configuring root-table
69
-
- enable mulit-task (if applicable)
70
-
- parameters for each entity.
71
-
- include a figure of the structure.
72
-
- emulated version.
68
+
*Case 0: Model profiling*
69
+
An example code is in main, you can either get the delay for each batch or get the per-layer delay.
70
+
71
+
*Case 1: Real system*
72
+
73
+
In this case you will run the data owners as real devices. You can run all entities in one machine or use different devices (within the same network)
74
+
75
+
- If you cannot use multicast:
76
+
- comment the following parts in the code:
77
+
- in data_owner.cpp: Comment the findPeers() call and the findInit()
78
+
- in compute_node.cpp: Comment the findInit()
79
+
- in aggregator.cpp: Comment the findInit()
80
+
- in network_layer.cpp: comment the line 506 of versio 1.0.0
81
+
- update the rooting table in pipeline_simulation/network_layer.h
82
+
83
+
For each data owner device call:
84
+
85
+
$ ./data_owner -i id -d <number-of-data-owners> -c <number-of-compute-nodes> -s <split-rule>
86
+
87
+
If not an init data owner you just give the node's id
88
+
89
+
For each compute node device call:
90
+
91
+
$ ./compute_node -i id
92
+
93
+
or use script run_cn.sh in pipeline_simulation/profiling
94
+
95
+
For the aggregator:
96
+
97
+
$ ./aggregator -i id -d <number-of-data-owners> -c <number-of-compute-nodes>
98
+
99
+
or use script run_aggr.sh in pipeline_simulation/profiling
100
+
101
+
NOTE: There is support for logging and checkpoining but this feature is deactivated for this version. You can use the utils/pipeline_logging.sh to do so.
102
+
103
+
*Case 3: Emulated environment*
104
+
105
+
In this case the data owners are running in an emulated environmet. Note that this version does not supprt multicast.
106
+
You can add in the pipeline_simulation/profiling/rpi_stat.h the device characteristics and use the script run_data_owners_init.sh and run_data_owners_worker.sh in
107
+
pipeline_simulation/profiling.
108
+
The results are stored to logging files as are indicated in the script files (change them accordingly)
109
+
110
+
The code for the emulated data owner is in pipeline_simulation/profiling/data_owner_simulatede.cpp
0 commit comments