Skip to content

Conversation

@DanielOsypenko
Copy link
Contributor

This PR is a continuation of the Create hosted cluster with agents #13272 PR

With this PR we added:

  • get_ip_list_by_cluster_id from assistant installer
  • dns records modification
  • booting machines with specs based on hosted cluster configuration
  • added a class VSPHEREAgentAI
  • some restructuring to align deployment classes relation with the logic of the product (1 Hosted Cluster attached to 1 Host Environment)
  • resolve MCE mirrors issues

@DanielOsypenko DanielOsypenko self-assigned this Dec 7, 2025
@DanielOsypenko DanielOsypenko requested a review from a team as a code owner December 7, 2025 17:23
@DanielOsypenko DanielOsypenko added provider-client Provider-client solution provider mode deployment Issues preventing Provider mode deployments on any stage labels Dec 7, 2025
DEPLOYMENT:
allow_lower_instance_requirements: false
ENV_DATA:
platform: "vsphere_agent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use only "hosted_cluster_platform" value as an identifier for agent/kubevirt.
We are working on changing the platform values to represent as platform only, not the type of cluster configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster still must have a platform, if we do not put it here it will be aws by a default.

We continue use hosted_cluster_platform as a distinctive key-value pair to separate kubevirt from agent deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform should be only vsphere and the information that it is agent deployment should be specified somewhere else.
I think it will make more sense to set the deployment_type to agent or ai_agent, because it actually is not deployed via Assisted Installer service, event though the process is very similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines 35 to 38
elif self.deployment_platform in [
constants.VSPHERE_PLATFORM,
constants.VSPHERE_AGENT_PLATFORM,
]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jilju, referring to your comment. We must use a platform to identify deployer class with deployment factory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use only

Suggested change
elif self.deployment_platform in [
constants.VSPHERE_PLATFORM,
constants.VSPHERE_AGENT_PLATFORM,
]:
elif self.deployment_platform in [
constants.VSPHERE_PLATFORM
]:

?
The keys in self.cls_map.update is platform+_+deployment_type which will be vsphere_ai_agent.

Signed-off-by: Daniel Osypenko <[email protected]>
DEPLOYMENT:
allow_lower_instance_requirements: false
ENV_DATA:
platform: "vsphere_agent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform should be only vsphere and the information that it is agent deployment should be specified somewhere else.
I think it will make more sense to set the deployment_type to agent or ai_agent, because it actually is not deployed via Assisted Installer service, event though the process is very similar.

log_step("Boot machines for Agent hosted cluster with min image")
if not self.boot_machines_for_agent():
# this cluster will not be added to the list of deployed clusters and ODF installation will be skipped
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, if I didn't miss anything there - but wouldn't this silently skip any issue with the creating of the machines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will log the issue with error and move to deploy rest of the hosted clusters. In a later step we are checking if all desired clusters were actually created and failing the job if not ->

logger.error("Some of the desired hosted OCP clusters were not created")

Comment on lines 2109 to 2121
self.ai_cluster = assisted_installer.AssistedInstallerCluster(
name=self.cluster_name,
cluster_path=self.cluster_path,
openshift_version=str(version.get_semantic_ocp_version_from_config()),
base_dns_domain=config.ENV_DATA["base_domain"],
api_vip=self.api_vip,
ingress_vip=self.ingress_vip,
ssh_public_key=self.get_ssh_key(),
pull_secret=self.get_pull_secret(),
)

log_step("create (register) cluster in Assisted Installer console")
self.ai_cluster.create_cluster()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part relevant for the Agent deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant, removed

Signed-off-by: Daniel Osypenko <[email protected]>
jilju
jilju previously approved these changes Dec 10, 2025
for agent_info in agents_obj.get()["items"]:
agents_obj.patch(
resource_name=agent_info["metadata"]["name"],
params='{"spec":{"approved":true}}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another function in this PR which does the approval

@openshift-ci openshift-ci bot added the lgtm label Dec 10, 2025
dahorak added a commit to dahorak/ocs-ci that referenced this pull request Dec 11, 2025
dahorak added a commit to dahorak/ocs-ci that referenced this pull request Dec 11, 2025
@dahorak
Copy link
Contributor

dahorak commented Dec 12, 2025

Based on yesterday's discussion, I would propose to handle the HCP Agent cluster deployment similarly as standalone OCP deployment via Assisted Installer - with the difference, that we will provide a reference to the Hub cluster, which will be used instead of the public Assisted Installer service.
In other words, each Agent cluster will be deployed using separated Deploy OCS Cluster job, from the perspective of the agent cluster and the Hub cluster will be referenced in the configuration as DEPLOYMENT["hub_cluster_name"] and DEPLOYMENT["hub_cluster_path"] and based on this it will dynamically create the configuration context for the Hub cluster and use that context for the tasks performed on the Hub cluster (infrastructure creation, nodes approval,....).

Here I created proof of concept update for creating the Hub cluster config context: 86292e7.

@jilju WDYT?

Related MR for enabling this in the main Deploy OCS Cluster job: https://url.corp.redhat.com/ade84c0

@openshift-ci openshift-ci bot removed the lgtm label Dec 14, 2025
@DanielOsypenko DanielOsypenko force-pushed the deploy-agents-420-1 branch 6 times, most recently from 8793bb4 to cb2e5a5 Compare December 14, 2025 16:52
Signed-off-by: Daniel Osypenko <[email protected]>
jilju
jilju previously approved these changes Jan 7, 2026
@openshift-ci openshift-ci bot added the lgtm label Jan 7, 2026
self.terraform_data_dir,
"terraform.tfvars.json",
)
# control plane specs removed by purpose. HCP spoke clusters can not have own control plane nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will have to explicitly set:

"control_plane_count": 0,

because this variable is by default set to 3 in the terraform:

variable "control_plane_count" {
type = string
default = "3"
}

Comment on lines +2229 to +2230
"""
Request API and Ingress IPs from IPAM server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need additional API IP for the hosted cluster? IIUC API is provided by the control plane/master nodes, which means that it is provided by the hosting cluster.

Args:
api_ips (list(str)): IPs for api.<cluster>.
apps_ips (list(str)): IPs for *.apps.<cluster>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use ingress instead of apps to align it with the other code and also with the naming from OCP.

AZURE_WITH_LOGS_PLATFORM = "azure-with-logs"
GCP_PLATFORM = "gcp"
VSPHERE_PLATFORM = "vsphere"
VSPHERE_AGENT_PLATFORM = "vsphere_agent"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not used anywhere (agent is part of deployment type, not platform).

Comment on lines +795 to +802
def download_with_retries(url, boot_image_path, max_retries=3):
"""
Download file with retries and proper error handling
Args:
url (str): URL of the file to download
boot_image_path (str): Path where to save the downloaded file
max_retries (int): Maximum number of retries for downloading the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function is quite common/universal, I would propose to change the boot_image_path (and other references to ISO/boot image) to some common name (like filename in the above download_file function).

I'm also thinking, if it is worth to keep both the old download_file and new download_with_retries functions, while they do the same and this new one is just enhancement of the old one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Signed-off-by: Daniel Osypenko <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DanielOsypenko

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DanielOsypenko

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider mode deployment Issues preventing Provider mode deployments on any stage provider-client Provider-client solution size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants