-
Notifications
You must be signed in to change notification settings - Fork 186
Deploy HCP agent clusters on vSphere #13858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Deploy HCP agent clusters on vSphere #13858
Conversation
Signed-off-by: Jilju Joy <[email protected]>
Signed-off-by: Jilju Joy <[email protected]>
Signed-off-by: Jilju Joy <[email protected]>
Signed-off-by: Jilju Joy <[email protected]>
Signed-off-by: Jilju Joy <[email protected]>
Signed-off-by: Jilju Joy <[email protected]>
| DEPLOYMENT: | ||
| allow_lower_instance_requirements: false | ||
| ENV_DATA: | ||
| platform: "vsphere_agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use only "hosted_cluster_platform" value as an identifier for agent/kubevirt.
We are working on changing the platform values to represent as platform only, not the type of cluster configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster still must have a platform, if we do not put it here it will be aws by a default.
We continue use hosted_cluster_platform as a distinctive key-value pair to separate kubevirt from agent deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The platform should be only vsphere and the information that it is agent deployment should be specified somewhere else.
I think it will make more sense to set the deployment_type to agent or ai_agent, because it actually is not deployed via Assisted Installer service, event though the process is very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
ocs_ci/deployment/factory.py
Outdated
| elif self.deployment_platform in [ | ||
| constants.VSPHERE_PLATFORM, | ||
| constants.VSPHERE_AGENT_PLATFORM, | ||
| ]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jilju, referring to your comment. We must use a platform to identify deployer class with deployment factory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use only
| elif self.deployment_platform in [ | |
| constants.VSPHERE_PLATFORM, | |
| constants.VSPHERE_AGENT_PLATFORM, | |
| ]: | |
| elif self.deployment_platform in [ | |
| constants.VSPHERE_PLATFORM | |
| ]: |
?
The keys in self.cls_map.update is platform+_+deployment_type which will be vsphere_ai_agent.
Signed-off-by: Daniel Osypenko <[email protected]>
1d2f931 to
3c4ad90
Compare
| DEPLOYMENT: | ||
| allow_lower_instance_requirements: false | ||
| ENV_DATA: | ||
| platform: "vsphere_agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The platform should be only vsphere and the information that it is agent deployment should be specified somewhere else.
I think it will make more sense to set the deployment_type to agent or ai_agent, because it actually is not deployed via Assisted Installer service, event though the process is very similar.
| log_step("Boot machines for Agent hosted cluster with min image") | ||
| if not self.boot_machines_for_agent(): | ||
| # this cluster will not be added to the list of deployed clusters and ODF installation will be skipped | ||
| return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, if I didn't miss anything there - but wouldn't this silently skip any issue with the creating of the machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it will log the issue with error and move to deploy rest of the hosted clusters. In a later step we are checking if all desired clusters were actually created and failing the job if not ->
ocs-ci/ocs_ci/deployment/hub_spoke.py
Line 628 in 3c4ad90
| logger.error("Some of the desired hosted OCP clusters were not created") |
ocs_ci/deployment/vmware.py
Outdated
| self.ai_cluster = assisted_installer.AssistedInstallerCluster( | ||
| name=self.cluster_name, | ||
| cluster_path=self.cluster_path, | ||
| openshift_version=str(version.get_semantic_ocp_version_from_config()), | ||
| base_dns_domain=config.ENV_DATA["base_domain"], | ||
| api_vip=self.api_vip, | ||
| ingress_vip=self.ingress_vip, | ||
| ssh_public_key=self.get_ssh_key(), | ||
| pull_secret=self.get_pull_secret(), | ||
| ) | ||
|
|
||
| log_step("create (register) cluster in Assisted Installer console") | ||
| self.ai_cluster.create_cluster() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part relevant for the Agent deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not relevant, removed
Signed-off-by: Daniel Osypenko <[email protected]>
83c7b39 to
a97438b
Compare
| for agent_info in agents_obj.get()["items"]: | ||
| agents_obj.patch( | ||
| resource_name=agent_info["metadata"]["name"], | ||
| params='{"spec":{"approved":true}}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another function in this PR which does the approval
…s on vSphere Signed-off-by: Daniel Horak <[email protected]>
…s on vSphere Signed-off-by: Daniel Horak <[email protected]>
|
Based on yesterday's discussion, I would propose to handle the HCP Agent cluster deployment similarly as standalone OCP deployment via Assisted Installer - with the difference, that we will provide a reference to the Hub cluster, which will be used instead of the public Assisted Installer service. Here I created proof of concept update for creating the Hub cluster config context: 86292e7. @jilju WDYT? Related MR for enabling this in the main Deploy OCS Cluster job: https://url.corp.redhat.com/ade84c0 |
…s on vSphere Signed-off-by: Daniel Horak <[email protected]>
8793bb4 to
cb2e5a5
Compare
Signed-off-by: Daniel Osypenko <[email protected]>
cb2e5a5 to
cbccdc9
Compare
| self.terraform_data_dir, | ||
| "terraform.tfvars.json", | ||
| ) | ||
| # control plane specs removed by purpose. HCP spoke clusters can not have own control plane nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will have to explicitly set:
"control_plane_count": 0,
because this variable is by default set to 3 in the terraform:
ocs-ci/terraform/ai/vsphere/variables.tf
Lines 90 to 93 in 732d87b
| variable "control_plane_count" { | |
| type = string | |
| default = "3" | |
| } |
| """ | ||
| Request API and Ingress IPs from IPAM server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need additional API IP for the hosted cluster? IIUC API is provided by the control plane/master nodes, which means that it is provided by the hosting cluster.
| Args: | ||
| api_ips (list(str)): IPs for api.<cluster>. | ||
| apps_ips (list(str)): IPs for *.apps.<cluster>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use ingress instead of apps to align it with the other code and also with the naming from OCP.
| AZURE_WITH_LOGS_PLATFORM = "azure-with-logs" | ||
| GCP_PLATFORM = "gcp" | ||
| VSPHERE_PLATFORM = "vsphere" | ||
| VSPHERE_AGENT_PLATFORM = "vsphere_agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not used anywhere (agent is part of deployment type, not platform).
| def download_with_retries(url, boot_image_path, max_retries=3): | ||
| """ | ||
| Download file with retries and proper error handling | ||
| Args: | ||
| url (str): URL of the file to download | ||
| boot_image_path (str): Path where to save the downloaded file | ||
| max_retries (int): Maximum number of retries for downloading the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the function is quite common/universal, I would propose to change the boot_image_path (and other references to ISO/boot image) to some common name (like filename in the above download_file function).
I'm also thinking, if it is worth to keep both the old download_file and new download_with_retries functions, while they do the same and this new one is just enhancement of the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
Signed-off-by: Daniel Osypenko <[email protected]>
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DanielOsypenko The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
1 similar comment
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DanielOsypenko The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
This PR is a continuation of the Create hosted cluster with agents #13272 PR
With this PR we added: