-
Notifications
You must be signed in to change notification settings - Fork 937
Add terraform for provisioning s390x build cluster on ibmcloud #8413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Welcome @sudharshanibm! |
|
Hi @sudharshanibm. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sudharshanibm The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
167d452 to
2867a10
Compare
2867a10 to
1ef2ece
Compare
| create_duration = "180s" # Wait 3 minutes for full initialization | ||
| } | ||
|
|
||
| resource "null_resource" "bastion_setup" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not very declarative - is this truly required or can the verification be run as a one-off by someone? do we expect things to not be setup correct via Terraform (i.e. - verification here fails) periodically or at all? (if so - that seems problematic and who or what is intended to fix that 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bastion instance’s initial setup and networking configuration are handled through a cloud-init script in user_data. Runtime verification of the bastion’s internal state is treated as an operational concern, and can be performed outside of Terraform via automation/manual checks.
| vpc = data.ibm_is_vpc.vpc.id | ||
| } | ||
|
|
||
| data "ibm_is_security_group" "master_sg" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use an improved alternate terminology? main, primary, control-plane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated all occurrences of master to control-plane
| @@ -0,0 +1,129 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use of k8s-* throughout the file names, resource names, etc. seems to be redundant (i.e. - its defined in a K8s repo/org so its assumed to be for K8s, no need to re-state that in the names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bryantbiggs , thanks for pointing that out
I followed the same naming convention since the existing s390x setup is already connected to ArgoCD with k8s-* prefixed names (ref: #8332 ).
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
| resource "ibm_is_lb" "k8s_load_balancer" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the names should be viewed as a hierarchy level just below the resource type, but read together. so ibm_is_lb.k8s_load_balancer is essentially IBM load balancer - Kubernetes load balancer
what about something shorter and more descriptive public (ibm_is.lb.public)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated k8s_load_balancer to public
| limitations under the License. | ||
| */ | ||
| resource "ibm_is_lb" "k8s_load_balancer" { | ||
| name = "k8s-s390x-lb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, the name is very redundant. Its in a K8s repo, is this infrastructure deployed in an account that hosts resources for things outside of K8s? If yes = fine, keep the k8s, if not, drop it. Also, the resource is a load balancer so no need to add that to the name (my name is Bryant Biggs not Bryant Biggs human 😅 ). Lastly, I assume this is only for testing on s390x platforms so is that adding anything to the name? Maybe it is, maybe it isn't - I don't have enough context to say. But what about something more descriptive as to its intent and aligned with resources already defined in the repo
| kops-infra-ci-name = "kops-infra-ci" |
what about s390x-ci? or if you desire the full name k8s-s390x-ci?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this same thinking above applies throughout the change set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated k8s-s390x-lb to k8s-s390x-ci
| health_timeout = 2 | ||
| health_type = "tcp" | ||
| health_monitor_url = "/" | ||
| health_monitor_port = 6443 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6443 appears in a number of places - good chance it should be a local variable to avoid mis-matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to use a variable api_server_port (default 6443)
| name = "k8s-s390x-subnet" | ||
| } | ||
|
|
||
| data "ibm_is_security_group" "bastion_sg" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| data "ibm_is_security_group" "bastion_sg" { | |
| data "ibm_is_security_group" "bastion" { |
Security group is already in the resource type - don't repeat in the resource name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already have resource ibm_is_instance bastion defined in bastion.tf, so reusing bastion here would cause a naming conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - thats a different resource from this data source so no conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed bastion_sg --> bastion
| } | ||
|
|
||
| resource "ibm_is_instance" "control_plane" { | ||
| count = var.control_plane.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a for_each so that we avoid amplifying disruptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the control-plane to use for_each in nodes.tf
| default = "eu-de-1" | ||
| } | ||
|
|
||
| variable "resource_group_name" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the variable, Thanks!
| } | ||
| } | ||
|
|
||
| variable "bastion" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bastion now generated in locals
| } | ||
| } | ||
|
|
||
| variable "control_plane" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
control-plane now generated in locals
| } | ||
| } | ||
|
|
||
| variable "compute" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute now generated in locals
| @@ -0,0 +1,136 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variables are mostly used in modules - I would argue, are these required? Should they be local variables instead (if they are used in multiple places but not intended to be configured dynamically outside of whats stored in git)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, these values are intended to be configurable by automation (Ansible playbooks) rather than hard-coded in the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ansible to deploy Terraform - why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for circling back on this. I realize my earlier explanation may have caused some confusion. What I meant is that while we do have downstream automation that consumes certain outputs for cluster setup, that doesn’t mean all of the Terraform configuration itself needs to be exposed as variables.
After revisiting your feedback, I’ve updated the code so that only true external inputs like API key, region, secrets manager ID, counts, profiles, etc., remain as variables. The structured definitions for bastion, control-plane, and compute nodes are now generated in locals, which keeps the interface cleaner and avoids over-exposing internal implementation details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't grok why most of these variables need to exist (ignoring the API key for now). for example, we can't simply change the region nor do I believe we intended on doing so. if we need to set the name of the region in multiple locations, thats what a local variable is good for.
variables are public inputs for a public interface
local variables are a private implementation detail
| } | ||
|
|
||
| resource "ibm_is_instance" "compute" { | ||
| count = var.compute.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise, use for_each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the compute to use for_each in nodes.tf
| } | ||
| } | ||
|
|
||
| resource "null_resource" "node_setup" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't appear to be doing anything - can it be removed?
| @@ -0,0 +1,43 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs are intended to be consumed elsewhere - are all of these intended to be consumed somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these outputs are consumed in our Ansible automation for installing Kubernetes. Specifically, the IPs and load balancer hostname are passed into the Ansible playbook ansible-playbook -v -i examples/k8s-build-cluster/hosts.yml install-k8s-ha.yaml -e @group_vars/bastion_configuration --extra-vars @group_vars/all to configure the bastion, control-plane, and worker nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't know I fully follow this. so the plan would be to write these outputs to file or? how do you bridge the gap between an output in Terraform and said output landing in a file on disk that ansible then uses? Is that ansible definition going to live in this repo as well? Should we store these values somewhere that Ansible can readily retrieve - S3/SSM/etc?
03e9978 to
5af44e7
Compare
5af44e7 to
369d8da
Compare
|
Hi @bryantbiggs |
| **2. Export COS Secrets** | ||
| <br> Export `access_key` and `secret_key` as environment variables. | ||
| ``` | ||
| export AWS_ACCESS_KEY_ID="<HMAC_ACCESS_KEY_ID>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IAM roles and short lived/ephemeral credentials should be used in place of static access and secret keys
| ``` | ||
|
|
||
| **4. Check the `variables.tf` file** | ||
| <br> Open the `variables.tf` file to review all the available variables. This file lists all customizable inputs for your Terraform configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who or what is intended to modify these variables? variables are only warranted on re-usable artifacts like modules. otherwise things should be declared statically or with local variables if the value needs to be kept in sync in various locations
|
|
||
| `ibmcloud_api_key`, `secrets_manager_id` are the only required variables that you must set in order to proceed. You can set this key either by adding it to your `var.tfvars` file or by exporting it as an environment variable. | ||
|
|
||
| **Option 1:** Set in `var.tfvars` file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really need to think long and hard about how secrets are handled - I don't think any of these options (nit: why are there options?) are sufficient
| } | ||
|
|
||
| resource "ibm_is_instance" "bastion" { | ||
| for_each = local.bastion_nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit like YAGNI - usually one bastion host is sufficient
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
| data "ibm_resource_group" "resource_group" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these names throughout this file are redundant
if you have to reference a 2nd resource group or a 2nd vpc, what will the data source name be? "resource_group_2"/"vpc2"? use the value provided to name is usually a safe route
| data "ibm_resource_group" "resource_group" { | |
| data "ibm_resource_group" "build_cluster" { |
| security_groups = [data.ibm_is_security_group.control_plane_sg.id] | ||
| } | ||
|
|
||
| resource "ibm_is_lb_pool" "k8s_api_pool" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid the redundant naming where possible
| resource "ibm_is_lb_pool" "k8s_api_pool" { | |
| resource "ibm_is_lb_pool" "k8s_api_server" { |
| health_monitor_port = var.api_server_port | ||
| } | ||
|
|
||
| resource "ibm_is_lb_listener" "k8s_api_listener" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| resource "ibm_is_lb_listener" "k8s_api_listener" { | |
| resource "ibm_is_lb_listener" "k8s_api_server" { |
| default_pool = ibm_is_lb_pool.k8s_api_pool.pool_id | ||
| } | ||
|
|
||
| resource "ibm_is_lb_pool_member" "k8s_api_members" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| resource "ibm_is_lb_pool_member" "k8s_api_members" { | |
| resource "ibm_is_lb_pool_member" "k8s_api_server" { |
| locals { | ||
| control_plane_nodes = { | ||
| for idx in range(1, var.control_plane_node_count + 1) : | ||
| "${idx}" => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as count #8413 (comment) FYI
I am not familiar with IBM cloud but this feels like they should be instance groups similar to an AWS autoscaling group instead of iterating over individual instances in Terraform. Do you agree? (cattle vs pets)
| description = "The instance ID of your secrets manager" | ||
| default = "" | ||
|
|
||
| validation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a redundant validation - simply drop the default = "" and you achieve the same outcome without needing to specify the validation block
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
| resource "ibm_resource_group" "build_resource_group" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant naming
| resource "ibm_resource_group" "build_resource_group" { | |
| resource "ibm_resource_group" "build_cluster" { |
|
|
||
| locals { | ||
| secrets_manager_region = "eu-de" | ||
| secrets_manager_name = "k8s-s390x-secrets-manager" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to only be used in one place so just use the value directly; no need for local variable
| locals { | ||
| secrets_manager_region = "eu-de" | ||
| secrets_manager_name = "k8s-s390x-secrets-manager" | ||
| z_service_cred_secret_name = "k8s-s390x-sm-service-credentials-secret" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't appear to be used anywhere - should we remove it?
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
| variable "resource_group_id" {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description, type, and default (if there is one)
| limitations under the License. | ||
| */ | ||
|
|
||
| terraform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
| resource "ibm_is_vpc" "vpc" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, all of these are redundant names. since this is a module, the convention is typically to just call this this for the resource names and only use distinct names where 2 or more resources of the same time are utilized
| limitations under the License. | ||
| */ | ||
| resource "ibm_is_vpc" "vpc" { | ||
| name = "k8s-s390x-vpc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop the redundant resource name from the naming scheme throughout
- remove
-vpchere
|
|
||
| # VPC | ||
| resource "ibm_is_public_gateway" "public_gw" { | ||
| name = "k8s-s390x-public-gw" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name = "k8s-s390x-public-gw" | |
| name = "k8s-s390x" |
|
|
||
| # Subnet | ||
| resource "ibm_is_subnet" "subnet" { | ||
| name = "k8s-s390x-subnet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most likely you can create a local variable of
locals {
name = "k8s-s390x"
}and for all these resource names, just simple update to use
name = local.nameDon't add the -gw or -sg or -vpc, etc.
|
|
||
| # Security Groups | ||
| resource "ibm_is_security_group" "bastion_sg" { | ||
| name = "k8s-vpc-s390x-bastion-sg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would become
| name = "k8s-vpc-s390x-bastion-sg" | |
| name = "${local.name}-bastion" |
| @@ -0,0 +1 @@ | |||
| ibmcloud_api_key = "<YOUR_API_KEY>" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
FYI - I don't have approval authority on changes such as these in this repo; these are just my suggestions based on working with Terraform in various capacities over the years the sig-k8s-infra-leads are the final approving authority |
Signed-off-by: Sudharshan Muralidharan <[email protected]>
369d8da to
dd30419
Compare
Add terraform for provisioning s390x build cluster on ibmcloud