Skip to content

Commit 731baa1

Browse files
authored
feat: [PAYMCLOUD-541] Update ClouDO module, add runbooks, and improve AKS integrations (#3635)
* Add `.terraform.lock.hcl` file to version control to lock provider versions. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Initial setup for `ClouDO` infrastructure, including Terraform configurations, environment variables, and required modules. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Update Terraform module source and Azure provider version Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add new runbooks, update Terraform resources, and include encrypted secrets for ClouDO - Added multiple runbook scripts for AKS scaling, rollouts, system checks, and App Gateway metrics. - Updated Terraform modules and configurations, including new data sources and parameters. - Included encrypted `cloudo-slack-token` and `opsgenie_token` secrets. - Modified schemas and updated Docker image tags. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Update `aks-increate-max-keda-pod-scaling.sh` to remove unused arguments and simplify usage. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Adjust `aks-increate-max-keda-pod-scaling.sh` to conditionally decrement `maxReplicaCount` when `MONITOR_CONDITION` is resolved. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Ensure `maxReplicaCount` does not decrease below 1 when `MONITOR_CONDITION` is resolved. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Fix missing `fi` statement in `aks-increate-max-keda-pod-scaling.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Update Azure provider to 4.50.0, add new AKS scaling runbook, and enhance node pool scaling logic - Upgraded `azurerm` provider in `.terraform.lock.hcl` to version `4.50.0`. - Introduced `scale-pagopa-d-aks-user01-nodepool` runbook for managing DEV environment node pools. - Enhanced `aks-scale-node-pool.sh` to handle autoscaling node pools. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Log current node pool mode in `aks-scale-node-pool.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Refactor `aks-scale-node-pool.sh` to enhance scaling logic and accommodate autoscaling adjustments. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Simplify `aks-scale-node-pool.sh` by removing Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Align variable names in `aks-scale-node-pool Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Fix typo in `--min-count` parameter for `aks-scale-node-pool.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add log to display min and max node pool values during scaling Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Enable `--update-cluster-autoscaler` flag in `aks-scale-node-pool.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add log message to indicate node pool scaling operation in `aks-scale-node-pool.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add missing log messages to indicate node pool scaling operations in `aks-scale-node-pool.sh`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add `aks-info.py` runbook to retrieve AKS namespace details and update DEV schemas configuration - Introduced a new runbook `aks-info.py` for fetching namespace details, resource quotas, pod counts, and deployment status in AKS. - Updated `schemas.json.tpl` to include the `aks-info-dev` runbook entry. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Refactor `aks-info.py` to improve command execution with `run_kubectl` helper and enhance error handling and logging. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Remove unnecessary print statements and unused namespace information parsing in `aks-info.py`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Remove unused resource quota handling in `aks-info.py`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Add return statement in `aks-info.py` to indicate successful execution. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * wip Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Update `aks-deployments-rollout.sh` and DEV schemas - Add fallback for deployment name and monitor condition check in `aks-deployments-rollout.sh`. - Introduce `restart-pod` runbook entry in DEV `schemas.json.tpl`. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * Introduce ClouDO container configuration variables - Add `cloudo_orchestrator` and `cloudo_worker` variables for container image and registry configuration. - Replace hardcoded values with variable references in Terraform definitions. - Update `terraform.tfvars`, `01_cloudo.tf`, and `99_variables.tf` accordingly. Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update schemas and dependencies for ClouDO deployment adjustments. Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * wip Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update ClouDO configuration with UI parameters, network data sources, and Terraform dependencies adjustments. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Remove unused App Service configuration parameters in ClouDO module. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update ClouDO module reference, remove unused UI tier, and add approval runbook configuration. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Add AKS pod crash analysis runbook, enable ClouDO UI, and update network data sources. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Add AKS runbooks for event retrieval, node status checks, pod logs, pod restarts, and enhance existing scripts with error handling. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Add AKS runbook to check pod CPU and Memory usage with threshold validations. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Enhance AKS resource usage runbook with fallback for missing metrics-server, detailed resource requests/limits, and improved validations. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Add runbooks for Azure Application Gateway health, Storage Account checks, and VPN Gateway connection status Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update Azure Application Gateway health runbook to support positional parameters for resource group and gateway name Signed-off-by: ffppa <fabio.felici@pagopa.it> * Replace AKS namespace info script with FDR-specific health check runbook to consolidate Pod status, log checks, and mitigation via PostgreSQL cache restart. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Enhance AKS runbooks with CPU/memory unit parsing functions for consistent resource conversion and fix node status output formatting. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Refactor AKS resource usage runbook to replace `bc` dependency with `awk` for CPU/memory unit conversion, improving portability and efficiency. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Enhance AKS resource usage runbook to support filtering by specific pod, improve error messaging, and refine CPU/memory usage handling. Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update ClouDO module references, refine Slack channel naming, add Google SSO integration, and improve schema structure/clarity. Upgrade dependencies (Terraform provider, SOPs). Signed-off-by: ffppa <fabio.felici@pagopa.it> * Update ClouDO module reference to latest commit hash Signed-off-by: ffppa <fabio.felici@pagopa.it> --------- Signed-off-by: Fabio Felici <fabio.felici@pagopa.it> Signed-off-by: ffppa <fabio.felici@pagopa.it>
1 parent 6c65d67 commit 731baa1

35 files changed

+1558
-25
lines changed

src/cloudo/.terraform.lock.hcl

Lines changed: 81 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/cloudo/00_data.tf

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
2+
data "azurerm_key_vault" "key_vault" {
3+
name = "${local.product}-kv"
4+
resource_group_name = "${local.product}-sec-rg"
5+
}
6+
7+
data "azurerm_key_vault_secret" "github_pat" {
8+
name = "payments-cloud-github-bot-pat"
9+
key_vault_id = data.azurerm_key_vault.key_vault.id
10+
}
11+
12+
data "azurerm_key_vault_secret" "cloudo_slack_token" {
13+
name = "cloudo-slack-token"
14+
key_vault_id = data.azurerm_key_vault.key_vault.id
15+
16+
}
17+
18+
data "azurerm_key_vault_secret" "opsgenie_token" {
19+
count = var.env_short == "p" ? 1 : 0
20+
name = "opsgenie-webhook-token"
21+
key_vault_id = data.azurerm_key_vault.key_vault.id
22+
}
23+
24+
25+
data "azurerm_application_insights" "app_insight" {
26+
name = var.application_insisght_name
27+
resource_group_name = var.application_insisght_resource_group_name
28+
}
29+
30+
31+
data "azurerm_kubernetes_cluster" "aks_weu" {
32+
name = "${local.product_region}-${var.env}-aks"
33+
resource_group_name = "${local.product_region}-${var.env}-aks-rg"
34+
}
35+
36+
data "azurerm_kubernetes_cluster" "aks_itn" {
37+
name = "${local.product_ita}-${var.env}-aks"
38+
resource_group_name = "${local.product_ita}-${var.env}-aks-rg"
39+
}
40+
41+
data "azurerm_virtual_network" "network_tools_vnet" {
42+
name = "${var.prefix}-${var.env_short}-${var.location_short_ita}-spoke-tools-vnet"
43+
resource_group_name = "${var.prefix}-${var.env_short}-${var.location_short_ita}-network-hub-spoke-rg"
44+
}

src/cloudo/01_networks.tf

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
data "azurerm_subnet" "vpn_subnet" {
2+
name = "GatewaySubnet"
3+
resource_group_name = "${local.product}-vnet-rg"
4+
virtual_network_name = "${local.product}-vnet"
5+
}
6+
7+
data "azurerm_private_dns_zone" "private_endpoint_dns_zone" {
8+
name = "privatelink.azurewebsites.net"
9+
resource_group_name = "${local.product}-vnet-rg"
10+
}
11+
12+
data "azurerm_subnet" "private_endpoint_snet" {
13+
name = "${local.product}-common-private-endpoint-snet"
14+
resource_group_name = "${local.product}-vnet-rg"
15+
virtual_network_name = "${local.product}-vnet"
16+
}

src/cloudo/01_secrets.tf

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
data "azurerm_key_vault_secret" "google_client_id" {
3+
key_vault_id = data.azurerm_key_vault.key_vault.id
4+
name = "cloudo-google-client-id"
5+
}

src/cloudo/01_tags.tf

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
module "tag_config" {
2+
source = "../tag_config"
3+
domain = "cloudo"
4+
environment = var.env
5+
}

src/cloudo/02_cloudo.tf

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
2+
resource "azurerm_resource_group" "rg" {
3+
name = "${var.prefix}-${var.env_short}-${var.location_short_ita}-cloudo-rg"
4+
location = var.location_ita
5+
6+
tags = module.tag_config.tags
7+
}
8+
9+
module "cloudo" {
10+
source = "git::https://github.com/pagopa/payments-ClouDO.git//src/core/iac?ref=47479c40b3161ff9348ae3f114ce19c7d4a8a7d9"
11+
12+
prefix = local.product
13+
product_name = var.prefix
14+
env = var.env
15+
location = var.location_ita
16+
resource_group_name = azurerm_resource_group.rg.name
17+
application_insights_name = data.azurerm_application_insights.app_insight.name
18+
application_insights_rg = data.azurerm_application_insights.app_insight.resource_group_name
19+
subscription_id = data.azurerm_subscription.current.subscription_id
20+
vnet_name = data.azurerm_virtual_network.network_tools_vnet.name
21+
vnet_rg = data.azurerm_virtual_network.network_tools_vnet.resource_group_name
22+
23+
vpn_subnet_id = data.azurerm_subnet.vpn_subnet.id
24+
private_endpoint_dns_zone_name = data.azurerm_private_dns_zone.private_endpoint_dns_zone.name
25+
26+
cloudo_google_sso_integration_client_id = data.azurerm_key_vault_secret.google_client_id.value
27+
28+
github_repo_info = {
29+
repo_name = "pagopa/pagopa-infra"
30+
repo_branch = "main"
31+
runbook_path = "src/cloudo/runbooks"
32+
}
33+
34+
aks_integration = {
35+
weu = {
36+
cluster_id = data.azurerm_kubernetes_cluster.aks_weu.id
37+
},
38+
itn = {
39+
cluster_id = data.azurerm_kubernetes_cluster.aks_itn.id
40+
}
41+
}
42+
43+
approval_runbook = {
44+
ttl_min = "120"
45+
}
46+
47+
slack_integration = {
48+
channel = "#cloudo-${var.prefix}-${var.env}"
49+
token = data.azurerm_key_vault_secret.cloudo_slack_token.value
50+
}
51+
52+
opsgenie_api_key = var.env_short == "p" ? data.azurerm_key_vault_secret.opsgenie_token.0.value : ""
53+
54+
schemas = file("${path.module}/env/${var.env}/schemas.json.tpl")
55+
56+
orchestrator_image = {
57+
image_name = var.cloudo_orchestrator.image_name
58+
image_tag = var.cloudo_orchestrator.image_tag
59+
registry_url = var.cloudo_orchestrator.registry_url
60+
registry_username = var.cloudo_orchestrator.registry_username
61+
registry_password = data.azurerm_key_vault_secret.github_pat.value
62+
}
63+
64+
workers_config = {
65+
workers = {
66+
"generic-worker" = "generic"
67+
}
68+
image_name = var.cloudo_worker.image_name
69+
image_tag = var.cloudo_worker.image_tag
70+
registry_url = var.cloudo_worker.registry_url
71+
registry_username = var.cloudo_worker.registry_username
72+
registry_password = data.azurerm_key_vault_secret.github_pat.value
73+
}
74+
75+
enable_ui = true
76+
ui_image = {
77+
image_name = var.cloudo_ui.image_name
78+
image_tag = var.cloudo_ui.image_tag
79+
registry_url = var.cloudo_ui.registry_url
80+
registry_username = var.cloudo_ui.registry_username
81+
registry_password = data.azurerm_key_vault_secret.github_pat.value
82+
}
83+
84+
tags = module.tag_config.tags
85+
}
86+

src/cloudo/99_locals.tf

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
2+
locals {
3+
product = "${var.prefix}-${var.env_short}"
4+
product_region = "${var.prefix}-${var.env_short}-${var.location_short}"
5+
product_ita = "${var.prefix}-${var.env_short}-${var.location_short_ita}"
6+
}

src/cloudo/99_main.tf

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
terraform {
2+
required_version = ">= 1.9.0"
3+
4+
required_providers {
5+
azurerm = {
6+
source = "hashicorp/azurerm"
7+
version = ">= 4"
8+
}
9+
}
10+
backend "azurerm" {}
11+
}
12+
13+
provider "azurerm" {
14+
features {
15+
key_vault {
16+
purge_soft_delete_on_destroy = false
17+
}
18+
resource_group {
19+
prevent_deletion_if_contains_resources = false
20+
}
21+
}
22+
}
23+
24+
data "azurerm_subscription" "current" {}

src/cloudo/99_outputs.tf

Whitespace-only changes.

src/cloudo/99_variables.tf

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
variable "prefix" {
2+
type = string
3+
validation {
4+
condition = (
5+
length(var.prefix) <= 6
6+
)
7+
error_message = "Max length is 6 chars."
8+
}
9+
}
10+
11+
variable "env" {
12+
type = string
13+
}
14+
15+
variable "env_short" {
16+
type = string
17+
validation {
18+
condition = (
19+
length(var.env_short) == 1
20+
)
21+
error_message = "Length must be 1 chars."
22+
}
23+
}
24+
25+
#
26+
# location
27+
#
28+
variable "location" {
29+
type = string
30+
description = "One of westeurope, northeurope"
31+
}
32+
33+
variable "location_short" {
34+
type = string
35+
validation {
36+
condition = (
37+
length(var.location_short) == 3
38+
)
39+
error_message = "Length must be 3 chars."
40+
}
41+
description = "One of wue, neu"
42+
}
43+
44+
### Italy location
45+
variable "location_ita" {
46+
type = string
47+
description = "Main location"
48+
default = "italynorth"
49+
}
50+
51+
variable "location_short_ita" {
52+
type = string
53+
validation {
54+
condition = (
55+
length(var.location_short_ita) == 3
56+
)
57+
error_message = "Length must be 3 chars."
58+
}
59+
description = "Location short for italy: itn"
60+
default = "itn"
61+
}
62+
63+
64+
variable "application_insisght_name" {
65+
type = string
66+
description = "The name of the Application Insights resource for monitoring and alerting"
67+
}
68+
69+
variable "application_insisght_resource_group_name" {
70+
type = string
71+
description = "The name of the resource group where the Application Insights resource is located"
72+
}
73+
74+
###################
75+
### ClouDO Vars ###
76+
###################
77+
variable "cloudo_orchestrator" {
78+
type = object({
79+
image_name = optional(string, "pagopa/cloudo-orchestrator")
80+
image_tag = optional(string, "0.0.0")
81+
registry_url = optional(string, "https://ghcr.io")
82+
registry_username = optional(string)
83+
registry_password = optional(string)
84+
})
85+
description = "Configuration for the ClouDO orchestrator container, including container image details and registry authentication."
86+
}
87+
88+
variable "cloudo_ui" {
89+
type = object({
90+
image_name = optional(string, "pagopa/cloudo-ui")
91+
image_tag = optional(string, "0.0.0")
92+
registry_url = optional(string, "https://ghcr.io")
93+
registry_username = optional(string)
94+
registry_password = optional(string)
95+
})
96+
description = "Configuration for the ClouDO UI App Service, including container image details and registry authentication."
97+
}
98+
99+
variable "cloudo_worker" {
100+
type = object({
101+
image_name = optional(string, "pagopa/cloudo-worker")
102+
image_tag = optional(string, "0.0.0")
103+
registry_url = optional(string, "https://ghcr.io")
104+
registry_username = optional(string)
105+
registry_password = optional(string)
106+
})
107+
description = "Configuration for the ClouDO worker container, including container image details and registry authentication."
108+
}

0 commit comments

Comments
 (0)