Skip to content

Commit 36818cf

Browse files
Avritt Rohwerludtkemorgan
authored andcommitted
feat: bloom-dev open tofu config (bloom-housing#5580)
1 parent ef4bf44 commit 36818cf

15 files changed

Lines changed: 1852 additions & 46 deletions

File tree

docker-compose.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -107,10 +107,10 @@ services:
107107
DATABASE_URL: "postgres://postgres:example@db:5432/bloom_prisma"
108108
healthcheck:
109109
test: ["CMD", "curl", "--fail", "http://127.0.0.1:3100/"]
110-
interval: "2s"
111-
timeout: "1s"
112-
retries: 20
113-
start_period: "1s"
110+
interval: "5s"
111+
timeout: "2s"
112+
retries: 10
113+
start_period: "5s"
114114
depends_on:
115115
db:
116116
condition: service_healthy

infra/README.md

Lines changed: 53 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -10,47 +10,53 @@ Cloud Native Computing Foundation.
1010
is a set of resources that are all managed together. Each root module has a state file that
1111
records the results of the latest apply operation.
1212

13+
- [bloom_dev](./tofu_root_modules/bloom_dev/README.md): Configures the bloom-dev AWS account.
1314
- [bloom_dev_deployer_permission_set](./tofu_root_modules/bloom_dev_deployer_permission_set/README.md):
1415
Configures the bloom-dev-deployer permission set that is assigned on the bloom-dev account.
1516

17+
- [tofu_importable_modules](./tofu_importable_modules): Contains all the Open Tofu importable
18+
modules. An importable module is a reusable set of resources configured through input
19+
parameters. Root modules import importable modules.
20+
21+
- [bloom_deployment](./tofu_importable_modules/bloom_deployment/): Configures all the resources
22+
needed for a Bloom deployment in a single AWS account.
23+
24+
25+
1626
## Infrastructure-as-code mental model
1727

1828
Let's say that you need to deploy Bloom to an AWS account. A straight-forward way of achieving this
1929
would be to log into the AWS web console and create all the required resources. The downside of this
20-
approach is that unless you take really good notes, it is a difficult process to replicate. Even if
21-
you take really good notes, the process might have a lot of steps which are all opportunities for
22-
making mistakes. Additionally, it is not possible to automate such a process (well, maybe if you
23-
have one of those neat AIs that can control your browser. But the AI could also make mistakes just
24-
like a human).
30+
approach is that unless you take really good notes, it is a difficult process to replicate.
2531

2632
Another approach could be to write a bash script that calls a bunch of AWS CLI commands that create
2733
the resources. This improves on the web-based approach because all the steps are explicitly written
2834
down. However, the script only works on a fresh AWS account - if you run it again there will be
2935
a bunch of errors because the resources will have already been created. If you need to change how
30-
the account is configured, you need to write more scripts. That reminds me too much of database
31-
evolutions to seem like an good idea...
32-
33-
Enter infrastructure-as-code tools like Open Tofu. I like to think of them as CLI scripting with
34-
a bunch of functionality already built in. We have `.tf` files that contain [resource
35-
descriptions](https://opentofu.org/docs/language/resources/) for everything we want to configure. We
36-
can run the 'script' by running [`tofu apply`](https://opentofu.org/docs/cli/commands/apply/). If
37-
the AWS account already matches our desired configuration, Tofu gives us a nice message that the
38-
infrastructure matches the configuration. Otherwise, Tofu presents us with a list of planned changes
39-
it thinks it needs to make and asks if it should go forward with the plan.
40-
41-
These tools are not magic, however. It is still possible to misconfigure resources and get errors
42-
from the AWS API. These cases are not always handled gracefully and sometimes require deleting or
43-
configuring things manually to unbork the tool. Using a infrastructure-as-code still requires manual
44-
testing and knowledge of the underlying systems you are configuring. It is a heck of a lot better
45-
than shell scripting, however, at least in my experience :)
46-
47-
_Monologue by Avritt Rohwer_
36+
the account is configured, you need to write more scripts.
37+
38+
Enter infrastructure-as-code tools like Terraform and Open Tofu. I like to think of them as CLI
39+
scripting with a bunch of functionality already built in. `.tf` files contain [resource
40+
descriptions](https://opentofu.org/docs/language/resources/). Run the 'script' by running [`tofu
41+
apply`](https://opentofu.org/docs/cli/commands/apply/). If the AWS account already matches the
42+
desired configuration, Tofu prints a nice message that the infrastructure matches the
43+
configuration. Otherwise, Tofu presents a list of planned changes it wants to make and asks if it
44+
should go forward with the plan.
45+
46+
It is still possible to misconfigure resources and get errors from the AWS API. These cases are not
47+
always handled gracefully and sometimes require deleting or configuring things manually to unblock
48+
the tool. Using a infrastructure-as-code still requires manual testing and knowledge of the
49+
underlying systems you are configuring. It is a heck of a lot better than shell scripting, however,
50+
at least in my experience :)
4851

4952
## Developer setup
5053

51-
1. Install required tools:
52-
1. Open Tofu: https://opentofu.org/docs/intro/install/
53-
2. AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
54+
1. Install required CLI tools:
55+
1. bash: `which bash`
56+
2. openssl: `which openssl`
57+
3. tr: `which tr`
58+
4. Open Tofu: https://opentofu.org/docs/intro/install/
59+
5. AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
5460

5561
After installing, edit your `~/.aws/config` file for SSO authentication:
5662

@@ -94,34 +100,41 @@ tofu init
94100
```
95101

96102
Once the provider dependences have been downloaded, you will not have to run `tofu init` again
97-
unless you add a provider. In that case, run:
103+
unless you add a provider.
98104

99-
```bash
100-
tofu init
101-
```
102-
103-
To update a required version for a provider, change the version in the relevant `main.tf` file then
104-
run:
105+
To update a required version for a provider, change the version then run:
105106

106107
```bash
107108
tofu init -upgrade
108109
```
109110

110-
Both of these command will download the new dependencies and update the `.terraform.lock.hcl` file in the root module
111-
directory.
111+
Both of these commands will download the new dependencies and update the `.terraform.lock.hcl` file
112+
in the root module directory.
112113

113114
### Applying changes
114115

115116
1. Open a shell and change directory to the root module.
116-
2. Run `aws sso login` to authenticate to AWS. After 1 hour, you will need to re-authenticate using
117-
the same command.
118-
3. Edit the `main.tf` file to update the desired configuration.
117+
2. Run `aws sso login --profile bloom-dev-deployer` to authenticate to AWS.
118+
3. Edit the `.tf` files.
119119
4. Run `tofu apply` and review the planned changes. If there are unexpected planned changes, go back
120-
to step 1. If all the changes are expected, approve the apply.
120+
to step 3. If all the changes are expected, approve the apply.
121121
5. Inspect the relevant AWS resources via the CLI or the AWS web console
122122
(Log in via https://d-9067ac8222.awsapps.com/start). If there are unexpected results, go back to
123-
step 1. In some cases you may have to manually modify or delete resources directly to 'unstick'
123+
step 3. In some cases you may have to manually modify or delete resources directly to 'unstick'
124124
Open Tofu.
125+
6. To delete only the resources provisioned by the bloom_deployment module, run `tofu destroy
126+
-target=module.bloom_deployment`.
127+
128+
#### Forcing resource recreation
129+
130+
To force Tofu to replace a resource, run `tofu apply -replace=ADDRESS`. For example:
131+
132+
```
133+
tofu apply -replace=module.bloom_deployment.aws_secretsmanager_secret.api_jwt_signing_key
134+
```
135+
136+
This is helpful when testing the local-exec provisioner because the provisioner only runs on
137+
resource creation.
125138

126139
## AWS setup done manually
127140

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Create a database.
2+
resource "aws_db_subnet_group" "bloom" {
3+
region = var.aws_region
4+
name = "bloom"
5+
subnet_ids = [for s in aws_subnet.private : s.id]
6+
}
7+
resource "aws_db_instance" "bloom" {
8+
identifier = "bloom"
9+
deletion_protection = local.is_prod
10+
engine = "postgres"
11+
engine_version = "17"
12+
instance_class = local.database_config.instance_class
13+
multi_az = var.high_availability
14+
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade", "iam-db-auth-error"]
15+
username = "master"
16+
manage_master_user_password = true
17+
18+
# Monitoring
19+
performance_insights_enabled = "true"
20+
performance_insights_retention_period = 7 # minimum
21+
database_insights_mode = "standard"
22+
23+
# Networking
24+
vpc_security_group_ids = [aws_security_group.db.id]
25+
iam_database_authentication_enabled = true
26+
db_subnet_group_name = aws_db_subnet_group.bloom.id
27+
28+
# Updates
29+
apply_immediately = true # If false, any changes are applied in the next maintenance window instead of when tofu apply runs.
30+
engine_lifecycle_support = "open-source-rds-extended-support"
31+
allow_major_version_upgrade = false
32+
auto_minor_version_upgrade = true
33+
34+
# Storage
35+
storage_encrypted = true
36+
storage_type = "gp2"
37+
allocated_storage = local.database_config.starting_storage_gb
38+
max_allocated_storage = local.database_config.max_storage_gb
39+
backup_retention_period = local.database_config.backup_retention_days
40+
final_snapshot_identifier = "bloom-db-finalsnapshot"
41+
skip_final_snapshot = !local.is_prod
42+
}
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Create an ECS cluster and services for each Bloom binary.
2+
resource "aws_iam_service_linked_role" "ecs" {
3+
aws_service_name = "ecs.amazonaws.com"
4+
}
5+
resource "aws_ecs_cluster" "bloom" {
6+
region = var.aws_region
7+
name = "bloom"
8+
depends_on = [aws_iam_service_linked_role.ecs]
9+
setting {
10+
name = "containerInsights"
11+
value = "enhanced"
12+
}
13+
}
14+
15+
# Set up service discovery for the sites to talk to the api.
16+
resource "aws_service_discovery_http_namespace" "bloom" {
17+
region = var.aws_region
18+
name = "bloom"
19+
description = "Service namespace the bloom services use."
20+
}
21+
22+
# Create logs groups.
23+
resource "aws_cloudwatch_log_group" "task_logs" {
24+
for_each = toset([
25+
"bloom-api",
26+
"bloom-site-partners",
27+
"bloom-site-public"
28+
])
29+
region = var.aws_region
30+
name = each.value
31+
log_group_class = "STANDARD"
32+
retention_in_days = local.ecs_logs_retention_days
33+
}
34+
35+
# Create a secret key used by the API to sign JWTs.
36+
resource "aws_secretsmanager_secret" "api_jwt_signing_key" {
37+
region = var.aws_region
38+
description = "Key used by the Bloom API to sign JWTs"
39+
name_prefix = "bloom-api-jwt-signing-key" # avoids 'you can't create this secret because a secret with this name is already scheduled for deletion' issue when re-deploying an account.
40+
recovery_window_in_days = 7 # minimum
41+
42+
# TODO: use an ephemeral resource instead of local-exec:
43+
# https://github.com/bloom-housing/bloom/issues/5637.
44+
#
45+
# The provisioner block runs after the resource has been created, and never again.
46+
provisioner "local-exec" {
47+
interpreter = ["/usr/bin/env", "bash", "-c"]
48+
# We need to be very careful that any errors result in a non-zero exit code. Otherwise tofu will
49+
# think this block succeeded and not error.
50+
command = <<-EOT
51+
if ! type -P aws &>/dev/null; then
52+
echo 'ERROR: aws required'
53+
exit 1
54+
fi
55+
if ! type -P openssl &>/dev/null; then
56+
echo 'ERROR: openssl required'
57+
exit 1
58+
fi
59+
if ! type -P tr &>/dev/null; then
60+
echo 'ERROR: tr required'
61+
exit 1
62+
fi
63+
64+
if ! s=$(openssl rand -base64 256 | tr -d '\n'); then
65+
echo 'ERROR: failed to generate random value'
66+
exit 1
67+
fi
68+
69+
if ! aws secretsmanager put-secret-value \
70+
--profile ${var.aws_profile} \
71+
--region ${var.aws_region} \
72+
--secret-id ${self.id} \
73+
--secret-string "$s"
74+
then
75+
echo 'ERROR: failed to put secret value'
76+
exit 1
77+
fi
78+
EOT
79+
}
80+
}
81+
82+
locals {
83+
roles = {
84+
"api" = {
85+
task_execution_policy_extra_statements = [
86+
{
87+
Action = "secretsmanager:GetSecretValue"
88+
Effect = "Allow"
89+
Resource = aws_db_instance.bloom.master_user_secret[0].secret_arn
90+
},
91+
{
92+
Action = "secretsmanager:GetSecretValue"
93+
Effect = "Allow"
94+
Resource = aws_secretsmanager_secret.api_jwt_signing_key.arn
95+
}
96+
]
97+
}
98+
"site-partners" = {
99+
task_execution_policy_extra_statements = []
100+
}
101+
"site-public" = {
102+
task_execution_policy_extra_statements = []
103+
}
104+
}
105+
}
106+
107+
# Create roles for the ECS task executor and the tasks.
108+
resource "aws_iam_role" "bloom_ecs" {
109+
for_each = local.roles
110+
name = "bloom-${each.key}-ecs"
111+
description = "Role the ECS service uses when launching Bloom ${each.key} tasks."
112+
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#create_task_iam_policy_and_role
113+
assume_role_policy = jsonencode({
114+
Version = "2012-10-17"
115+
Statement = [{
116+
Action = "sts:AssumeRole"
117+
Effect = "Allow"
118+
Principal = {
119+
Service = "ecs-tasks.amazonaws.com"
120+
}
121+
Condition = {
122+
ArnLike = {
123+
"aws:SourceArn" = "arn:aws:ecs:${var.aws_region}:${var.aws_account_number}:*"
124+
}
125+
StringEquals = {
126+
"aws:SourceAccount" = var.aws_account_number
127+
}
128+
}
129+
}]
130+
})
131+
}
132+
resource "aws_iam_role_policy" "bloom_ecs" {
133+
for_each = local.roles
134+
name = "bloom-${each.key}-ecs"
135+
role = aws_iam_role.bloom_ecs[each.key].id
136+
policy = jsonencode({
137+
Version = "2012-10-17"
138+
Statement = concat(
139+
[
140+
{
141+
Action = [
142+
"logs:CreateLogStream",
143+
"logs:PutLogEvents",
144+
]
145+
Effect = "Allow"
146+
Resource = "${aws_cloudwatch_log_group.task_logs["bloom-${each.key}"].arn}:log-stream:*"
147+
},
148+
],
149+
each.value.task_execution_policy_extra_statements
150+
)
151+
})
152+
}
153+
resource "aws_iam_role" "bloom_container" {
154+
for_each = local.roles
155+
name = "bloom-${each.key}-container"
156+
description = "Role the Bloom ${each.key} container runs as."
157+
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#create_task_iam_policy_and_role
158+
assume_role_policy = jsonencode({
159+
Version = "2012-10-17"
160+
Statement = [{
161+
Action = "sts:AssumeRole"
162+
Effect = "Allow"
163+
Principal = {
164+
Service = "ecs-tasks.amazonaws.com"
165+
}
166+
Condition = {
167+
ArnLike = {
168+
"aws:SourceArn" = "arn:aws:ecs:${var.aws_region}:${var.aws_account_number}:*"
169+
}
170+
StringEquals = {
171+
"aws:SourceAccount" = var.aws_account_number
172+
}
173+
}
174+
}]
175+
})
176+
}
177+
resource "aws_iam_role_policy" "bloom_container" {
178+
for_each = local.roles
179+
name = "bloom-${each.key}-container"
180+
role = aws_iam_role.bloom_container[each.key].id
181+
policy = jsonencode({
182+
Version = "2012-10-17"
183+
Statement = [{
184+
Action = "*"
185+
Effect = "Deny"
186+
Resource = "*"
187+
}]
188+
})
189+
}

0 commit comments

Comments
 (0)