Nvflare setup in Kubernetes in a private network #1745

rkanade · 2023-05-17T09:45:15Z

rkanade
May 17, 2023

Hello, I am trying to orchestrate everything in a Kubernetes environment on 2 instances in different network. I have generated the provision file with HA setting and with helm builder to deploy it using the Helm chart.

To give you a brief overview of my deployment, I have used Netmaker to create a private network and joined both instances to that network so the instances can communicate via netmaker interface IP. I have created Kubernetes cluster using kubeadm command and updated the node-ip to private netmaker IP in kubelet arguments for both instances. Additionally, I have used Calico CNI for pod netorking and got all pods successfully running and ready. I have added ingress-nginx controller to expose pod ports for FL server by updating the config map and daemon set part in the yaml file as mentioned in the Helm deployment of Nvflare - https://nvflare.readthedocs.io/en/latest/user_guide/helm_chart.html. After this I just used helm to install the Nvflare server to kubernetes which created 3 pods - Server1, Server2, and Overseer which were all successfully running and ready.

While the deployment of the NVFlare server was successful and I was able to login to the admin console, I encountered an issue when trying to start the client sites (site-1 and site-2). The error that I am receiving is as follows as per the site logs:

Cell - INFO - site-1: created backbone external connector to grpc://server2:8102
2023-04-25 12:17:22,020 - ConnectorManager - INFO - 1227537: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-04-25 12:17:22,020 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:26496] is starting
2023-04-25 12:17:22,521 - Cell - INFO - site-1: created backbone internal listener for tcp://localhost:26496
2023-04-25 12:17:22,521 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://server2:8102] is starting
2023-04-25 12:17:22,522 - FederatedClient - INFO - Wait for engine to be created.
2023-04-25 12:17:30,328 - nvflare.fuel.f3.sfm.conn_manager - INFO - Retrying [CH00001 ACTIVE grpc://server2:8102] in 8 seconds
2023-04-25 12:17:38,535 - nvflare.fuel.f3.sfm.conn_manager - INFO - Retrying [CH00001 ACTIVE grpc://server2:8102] in 16 seconds
2023-04-25 12:17:53,051 - MPM - ERROR - main_func execute exception: Login failed.
2023-04-25 12:17:53,052 - MPM - ERROR - Traceback (most recent call last):
File "/home/kubeflare/.local/lib/python3.10/site-packages/nvflare/fuel/f3/mpm.py", line 144, in run
rc = main_func()
File "/home/kubeflare/.local/lib/python3.10/site-packages/nvflare/private/fed/app/client/client_train.py", line 120, in main
raise RuntimeError("Login failed.")
RuntimeError: Login failed.
2023-04-25 12:17:55,254 - MPM - INFO - MPM: Good Bye!

I have reviewed the discussion on Github that suggests that this error could be related to the TLS settings. I would greatly appreciate your guidance on how to resolve this issue. - #1130 (reply in thread).

YuanTingHsieh · 2023-05-19T21:26:04Z

YuanTingHsieh
May 19, 2023
Maintainer

@IsaacYangSLA can you help comment on this, thanks in advance.

I will also try this when I got time.

0 replies

chesterxgchen · 2023-06-06T23:13:00Z

chesterxgchen
Jun 6, 2023
Maintainer

sorry, did not respond on this and just noticed your questions.
@IsaacYangSLA might know better on this.
I have seen several companies have successfully deployed in K8s envs. but setting the ingress related service is a bit challenging and we haven't get free hand to address this.
I noticed One user use the host port directly ( forgot the details).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvflare setup in Kubernetes in a private network #1745

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Nvflare setup in Kubernetes in a private network #1745

Uh oh!

rkanade May 17, 2023

Replies: 2 comments

Uh oh!

YuanTingHsieh May 19, 2023 Maintainer

Uh oh!

chesterxgchen Jun 6, 2023 Maintainer

rkanade
May 17, 2023

YuanTingHsieh
May 19, 2023
Maintainer

chesterxgchen
Jun 6, 2023
Maintainer