Skip to content

MGMT-20119: Create feature gate to allow installing TNA using the API #7523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

giladravid16
Copy link
Contributor

@giladravid16 giladravid16 commented Apr 9, 2025

This is the first PR for adding support for TNA Clusters in assisted-service.
The purpose of this PR is to add the minimum required to install TNA clusters using the REST API.
The changes include:

  • A new feature gate that when enabled it allows us to set a host's role to "arbiter", if it belongs to a cluster whose OCP version is 4.19 or higher (TNA Clusters are TP in 4.19).
  • Allow the user to set a cluster's control plane count to 2 if it's OCP version is 4.19 or higher.
  • If the cluster's control plane count is 2 then it must have at least 1 arbiter host.
  • Generating ignitions for arbiter hosts
  • Updating the validations and transitions to check arbiter hosts as well.

To check this PR I used my build to install a few TNA clusters and verified that the clusters installed successfully.

List all the issues related to this PR

Closes MGMT-20119

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 9, 2025

@giladravid16: This pull request references MGMT-20119 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2025
Copy link

openshift-ci bot commented Apr 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. api-review Categorizes an issue or PR as actively needing an API review. labels Apr 9, 2025
Copy link

openshift-ci bot commented Apr 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giladravid16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2025
@giladravid16 giladravid16 force-pushed the MGMT-20119 branch 2 times, most recently from 0597f75 to 3375cdd Compare April 21, 2025 12:57
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 21, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 21, 2025

@giladravid16: This pull request references MGMT-20119 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This is the first PR for adding support for TNA Clusters in assisted-service.
The purpose of this PR is to add the minimum required to install TNA clusters using the REST API.
The changes include:

  • A new feature gate that when enabled it allows us to set a host's role to "arbiter", if it belongs to a cluster whose OCP version is 4.19 or higher (TNA Clusters are TP in 4.19).
  • Allow the user to set a cluster's control plane count to 2 if it's OCP version is 4.19 or higher.
  • If the cluster's control plane count is 2 then it must have at least 1 arbiter host.
  • Generating ignitions for arbiter hosts
  • Updating the validations and transitions to check arbiter hosts as well.

To check this PR I used my build to install a few TNA clusters and verified that the clusters installed successfully.

List all the issues related to this PR

Closes MGMT-20119

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • [] No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 21, 2025

@giladravid16: This pull request references MGMT-20119 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This is the first PR for adding support for TNA Clusters in assisted-service.
The purpose of this PR is to add the minimum required to install TNA clusters using the REST API.
The changes include:

  • A new feature gate that when enabled it allows us to set a host's role to "arbiter", if it belongs to a cluster whose OCP version is 4.19 or higher (TNA Clusters are TP in 4.19).
  • Allow the user to set a cluster's control plane count to 2 if it's OCP version is 4.19 or higher.
  • If the cluster's control plane count is 2 then it must have at least 1 arbiter host.
  • Generating ignitions for arbiter hosts
  • Updating the validations and transitions to check arbiter hosts as well.

To check this PR I used my build to install a few TNA clusters and verified that the clusters installed successfully.

List all the issues related to this PR

Closes MGMT-20119

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • [] No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 21, 2025

@giladravid16: This pull request references MGMT-20119 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This is the first PR for adding support for TNA Clusters in assisted-service.
The purpose of this PR is to add the minimum required to install TNA clusters using the REST API.
The changes include:

  • A new feature gate that when enabled it allows us to set a host's role to "arbiter", if it belongs to a cluster whose OCP version is 4.19 or higher (TNA Clusters are TP in 4.19).
  • Allow the user to set a cluster's control plane count to 2 if it's OCP version is 4.19 or higher.
  • If the cluster's control plane count is 2 then it must have at least 1 arbiter host.
  • Generating ignitions for arbiter hosts
  • Updating the validations and transitions to check arbiter hosts as well.

To check this PR I used my build to install a few TNA clusters and verified that the clusters installed successfully.

List all the issues related to this PR

Closes MGMT-20119

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 21, 2025

@giladravid16: This pull request references MGMT-20119 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

This is the first PR for adding support for TNA Clusters in assisted-service.
The purpose of this PR is to add the minimum required to install TNA clusters using the REST API.
The changes include:

  • A new feature gate that when enabled it allows us to set a host's role to "arbiter", if it belongs to a cluster whose OCP version is 4.19 or higher (TNA Clusters are TP in 4.19).
  • Allow the user to set a cluster's control plane count to 2 if it's OCP version is 4.19 or higher.
  • If the cluster's control plane count is 2 then it must have at least 1 arbiter host.
  • Generating ignitions for arbiter hosts
  • Updating the validations and transitions to check arbiter hosts as well.

To check this PR I used my build to install a few TNA clusters and verified that the clusters installed successfully.

List all the issues related to this PR

Closes MGMT-20119

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@giladravid16 giladravid16 marked this pull request as ready for review April 22, 2025 07:27
@openshift-ci openshift-ci bot requested review from carbonin and pastequo April 22, 2025 07:28
@giladravid16 giladravid16 changed the title WIP: MGMT-20119: Create feature gate to allow installing TNA using the API MGMT-20119: Create feature gate to allow installing TNA using the API Apr 22, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2025
Copy link

codecov bot commented Apr 22, 2025

Codecov Report

Attention: Patch coverage is 82.06897% with 26 lines in your changes missing coverage. Please review.

Project coverage is 67.37%. Comparing base (ecf4703) to head (91d052d).
Report is 23 commits behind head on master.

Files with missing lines Patch % Lines
internal/ignition/installmanifests.go 68.00% 11 Missing and 5 partials ⚠️
internal/provider/baremetal/installConfig.go 37.50% 5 Missing ⚠️
internal/bminventory/inventory.go 89.74% 2 Missing and 2 partials ⚠️
internal/host/transition.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7523      +/-   ##
==========================================
+ Coverage   67.30%   67.37%   +0.06%     
==========================================
  Files         334      335       +1     
  Lines       42293    42454     +161     
==========================================
+ Hits        28465    28603     +138     
- Misses      11256    11273      +17     
- Partials     2572     2578       +6     
Files with missing lines Coverage Δ
internal/cluster/cluster.go 65.56% <ø> (ø)
internal/cluster/common.go 76.35% <100.00%> (+0.82%) ⬆️
internal/cluster/transition.go 74.58% <100.00%> (+0.32%) ⬆️
internal/cluster/validator.go 94.98% <100.00%> (+0.07%) ⬆️
internal/common/common.go 32.02% <100.00%> (+1.04%) ⬆️
internal/host/common.go 88.88% <100.00%> (ø)
internal/host/conditions.go 100.00% <100.00%> (ø)
internal/host/hostutil/host_utils.go 37.14% <100.00%> (ø)
internal/host/validator.go 83.43% <100.00%> (+0.03%) ⬆️
internal/installcfg/builder/builder.go 80.18% <100.00%> (+1.08%) ⬆️
... and 5 more

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

if highAvailabilityMode == models.ClusterCreateParamsHighAvailabilityModeFull {
var minMasterHostsNeededForInstallation int64 = common.MinMasterHostsNeededForInstallationInHaMode
minVersion := common.MinimumVersionForNonStandardHAOCPControlPlane
if !arbiterClustersNotSupported {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this get reversed in some way so we don't have a double negative here. It's harder to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

minVersion := common.MinimumVersionForNonStandardHAOCPControlPlane
if !arbiterClustersNotSupported {
minMasterHostsNeededForInstallation = common.MinMasterHostsNeededForInstallationInHaArbiterMode
minVersion = common.MinimumVersionForArbiterClusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of version in this function is very confusing.

It seems like we're trying to reason about different cases for different versions, but I never see the version in a condition anywhere. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable minVersion is only for the error message (if the controlPlaneCount is not allowed). The variable minMasterHostsNeededForInstallation changes based on whether or not the version is at least 4.19 or not (either 2 or 3), and that is used to validate the controlPlaneCount.

Comment on lines 6433 to 6443
err = errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
if cluster != nil && cluster.OpenshiftVersion != "" {
arbiterClustersNotSupported, err2 := common.BaseVersionLessThan(common.MinimumVersionForArbiterClusters, cluster.OpenshiftVersion)
if err2 != nil {
return err2
}
if !arbiterClustersNotSupported {
err = nil
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic feels a bit backwards. Why create the error first just to nil it out if we hit the success case?
Can you refactor this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

@giladravid16
Copy link
Contributor Author

/retest-required

Comment on lines +6429 to +6448
var err error
if !b.TNAClustersSupport {
err = errors.Errorf("TNA clusters support is disabled, cannot set role arbiter to host %s in infra-env %s", host.ID, host.InfraEnvID)
} else {
if cluster != nil && cluster.OpenshiftVersion != "" {
arbiterClustersSupported, err2 := common.BaseVersionGreaterOrEqual(common.MinimumVersionForArbiterClusters, cluster.OpenshiftVersion)
if err2 != nil {
return err2
}
if !arbiterClustersSupported {
err = errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
}
} else {
err = errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
}
}
if err != nil {
log.Error(err)
return common.NewApiError(http.StatusBadRequest, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var err error
if !b.TNAClustersSupport {
err = errors.Errorf("TNA clusters support is disabled, cannot set role arbiter to host %s in infra-env %s", host.ID, host.InfraEnvID)
} else {
if cluster != nil && cluster.OpenshiftVersion != "" {
arbiterClustersSupported, err2 := common.BaseVersionGreaterOrEqual(common.MinimumVersionForArbiterClusters, cluster.OpenshiftVersion)
if err2 != nil {
return err2
}
if !arbiterClustersSupported {
err = errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
}
} else {
err = errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
}
}
if err != nil {
log.Error(err)
return common.NewApiError(http.StatusBadRequest, err)
}
if !b.TNAClustersSupport {
err := errors.Errorf("TNA clusters support is disabled, cannot set role arbiter to host %s in infra-env %s", host.ID, host.InfraEnvID)
log.Error(err)
return common.NewApiError(http.StatusBadRequest, err)
}
if cluster == nil || cluster.OpenshiftVersion == nil {
err := errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
log.Error(err)
return common.NewApiError(http.StatusBadRequest, err)
}
arbiterClustersSupported, err := common.BaseVersionGreaterOrEqual(common.MinimumVersionForArbiterClusters, cluster.OpenshiftVersion)
if err != nil {
return err
}
if !arbiterClustersSupported {
err := errors.Errorf("Cannot set role arbiter to host %s in infra-env %s, it must be bound to a cluster with openshift version %s or newer", host.ID, host.InfraEnvID, common.MinimumVersionForArbiterClusters)
log.Error(err)
return common.NewApiError(http.StatusBadRequest, err)
}

Not super urgent, but I generally prefer guards with early returns to a lot of "else" statements. I find something like this easier to read because you don't have to consider multiple conditions at the same time when thinking about why some code is running. Additionally you don't have to worry about error shadowing here.

Also if we want to deduplicate the logging I'd suggest logging the error from the caller rather than every error case in this function.

Copy link

openshift-ci bot commented Apr 24, 2025

@giladravid16: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-ai-operator-disconnected-capi 91d052d link true /test edge-e2e-ai-operator-disconnected-capi
ci/prow/edge-e2e-nutanix-assisted-4-19 91d052d link true /test edge-e2e-nutanix-assisted-4-19

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants