Skip to content

Commit 9bb9b3a

Browse files
author
Avritt Rohwer
committed
feat: bloom-dev open tofu config
1 parent f642fc1 commit 9bb9b3a

14 files changed

Lines changed: 1816 additions & 38 deletions

File tree

docker-compose.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -107,10 +107,10 @@ services:
107107
DATABASE_URL: "postgres://postgres:example@db:5432/bloom_prisma"
108108
healthcheck:
109109
test: ["CMD", "curl", "--fail", "http://127.0.0.1:3100/"]
110-
interval: "2s"
111-
timeout: "1s"
112-
retries: 20
113-
start_period: "1s"
110+
interval: "5s"
111+
timeout: "2s"
112+
retries: 10
113+
start_period: "5s"
114114
depends_on:
115115
db:
116116
condition: service_healthy

infra/README.md

Lines changed: 49 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -10,47 +10,53 @@ Cloud Native Computing Foundation.
1010
is a set of resources that are all managed together. Each root module has a state file that
1111
records the results of the latest apply operation.
1212

13+
- [bloom_dev](./tofu_root_modules/bloom_dev/README.md): Configures the bloom-dev AWS account.
1314
- [bloom_dev_deployer_permission_set](./tofu_root_modules/bloom_dev_deployer_permission_set/README.md):
1415
Configures the bloom-dev-deployer permission set that is assigned on the bloom-dev account.
1516

17+
- [tofu_importable_modules](./tofu_importable_modules): Contains all the Open Tofu importable
18+
modules. An importable module is a reusable set of resources configured through input
19+
parameters. Root modules import importable modules.
20+
21+
- [bloom_deployment](./tofu_importable_modules/bloom_deployment/): Configures all the resources
22+
needed for a Bloom deployment in a single AWS account.
23+
24+
25+
1626
## Infrastructure-as-code mental model
1727

1828
Let's say that you need to deploy Bloom to an AWS account. A straight-forward way of achieving this
1929
would be to log into the AWS web console and create all the required resources. The downside of this
20-
approach is that unless you take really good notes, it is a difficult process to replicate. Even if
21-
you take really good notes, the process might have a lot of steps which are all opportunities for
22-
making mistakes. Additionally, it is not possible to automate such a process (well, maybe if you
23-
have one of those neat AIs that can control your browser. But the AI could also make mistakes just
24-
like a human).
30+
approach is that unless you take really good notes, it is a difficult process to replicate.
2531

2632
Another approach could be to write a bash script that calls a bunch of AWS CLI commands that create
2733
the resources. This improves on the web-based approach because all the steps are explicitly written
2834
down. However, the script only works on a fresh AWS account - if you run it again there will be
2935
a bunch of errors because the resources will have already been created. If you need to change how
30-
the account is configured, you need to write more scripts. That reminds me too much of database
31-
evolutions to seem like an good idea...
32-
33-
Enter infrastructure-as-code tools like Open Tofu. I like to think of them as CLI scripting with
34-
a bunch of functionality already built in. We have `.tf` files that contain [resource
35-
descriptions](https://opentofu.org/docs/language/resources/) for everything we want to configure. We
36-
can run the 'script' by running [`tofu apply`](https://opentofu.org/docs/cli/commands/apply/). If
37-
the AWS account already matches our desired configuration, Tofu gives us a nice message that the
38-
infrastructure matches the configuration. Otherwise, Tofu presents us with a list of planned changes
39-
it thinks it needs to make and asks if it should go forward with the plan.
40-
41-
These tools are not magic, however. It is still possible to misconfigure resources and get errors
42-
from the AWS API. These cases are not always handled gracefully and sometimes require deleting or
43-
configuring things manually to unbork the tool. Using a infrastructure-as-code still requires manual
44-
testing and knowledge of the underlying systems you are configuring. It is a heck of a lot better
45-
than shell scripting, however, at least in my experience :)
46-
47-
_Monologue by Avritt Rohwer_
36+
the account is configured, you need to write more scripts.
37+
38+
Enter infrastructure-as-code tools like Terraform and Open Tofu. I like to think of them as CLI
39+
scripting with a bunch of functionality already built in. `.tf` files contain [resource
40+
descriptions](https://opentofu.org/docs/language/resources/). Run the 'script' by running [`tofu
41+
apply`](https://opentofu.org/docs/cli/commands/apply/). If the AWS account already matches the
42+
desired configuration, Tofu prints a nice message that the infrastructure matches the
43+
configuration. Otherwise, Tofu presents us with a list of planned changes it thinks it needs to make
44+
and asks if it should go forward with the plan.
45+
46+
It is still possible to misconfigure resources and get errors from the AWS API. These cases are not
47+
always handled gracefully and sometimes require deleting or configuring things manually to unblock
48+
the tool. Using a infrastructure-as-code still requires manual testing and knowledge of the
49+
underlying systems you are configuring. It is a heck of a lot better than shell scripting, however,
50+
at least in my experience :)
4851

4952
## Developer setup
5053

51-
1. Install required tools:
52-
1. Open Tofu: https://opentofu.org/docs/intro/install/
53-
2. AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
54+
1. Install required CLI tools:
55+
1. bash: `which bash`
56+
2. openssl: `which openssl`
57+
3. tr: `which tr`
58+
4. Open Tofu: https://opentofu.org/docs/intro/install/
59+
5. AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
5460

5561
After installing, edit your `~/.aws/config` file for SSO authentication:
5662

@@ -100,8 +106,7 @@ unless you add a provider. In that case, run:
100106
tofu init
101107
```
102108

103-
To update a required version for a provider, change the version in the relevant `main.tf` file then
104-
run:
109+
To update a required version for a provider, change the version then run:
105110

106111
```bash
107112
tofu init -upgrade
@@ -113,15 +118,27 @@ directory.
113118
### Applying changes
114119

115120
1. Open a shell and change directory to the root module.
116-
2. Run `aws sso login` to authenticate to AWS. After 1 hour, you will need to re-authenticate using
117-
the same command.
118-
3. Edit the `main.tf` file to update the desired configuration.
121+
2. Run `aws sso login` to authenticate to AWS.
122+
3. Edit the tofu files for the desired configuration.
119123
4. Run `tofu apply` and review the planned changes. If there are unexpected planned changes, go back
120124
to step 1. If all the changes are expected, approve the apply.
121125
5. Inspect the relevant AWS resources via the CLI or the AWS web console
122126
(Log in via https://d-9067ac8222.awsapps.com/start). If there are unexpected results, go back to
123-
step 1. In some cases you may have to manually modify or delete resources directly to 'unstick'
127+
step 3. In some cases you may have to manually modify or delete resources directly to 'unstick'
124128
Open Tofu.
129+
6. To delete only the resources provisioned by the bloom_deployment module, run `tofu destroy
130+
-target=module.bloom_deployment`.
131+
132+
#### Forcing resource recreation
133+
134+
To force Tofu to replace a resource, run `tofu apply -recreate=ADDRESS`. For example:
135+
136+
```
137+
tofu apply -recreate=module.bloom_deployment.aws_secretsmanager_secret.api_jwt_signing_key
138+
```
139+
140+
This is helpful when testing the local-exec provisioner because the provisioner only runs on
141+
resource creation.
125142

126143
## AWS setup done manually
127144

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Create a database.
2+
locals {
3+
# Pricing: https://aws.amazon.com/rds/postgresql/pricing/?pg=pr&loc=3
4+
# Machine specs: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.Summary.html#hardware-specifications.burstable-inst-classes
5+
db_instance_class = local.is_prod ? "db.t4g.medium" : "db.t4g.micro"
6+
db_multi_az = local.is_prod ? true : false
7+
8+
# https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#gp2-storage
9+
# Unit: GiB
10+
db_start_storage = local.is_prod ? 10 : 5
11+
db_max_storage = local.is_prod ? 50 : 10
12+
13+
# Unit: days
14+
db_backup_retention = local.is_prod ? 30 : 7
15+
}
16+
resource "aws_db_subnet_group" "bloom" {
17+
region = var.aws_region
18+
name = "bloom"
19+
subnet_ids = [for s in aws_subnet.private : s.id]
20+
}
21+
resource "aws_db_instance" "bloom" {
22+
identifier = "bloom"
23+
deletion_protection = local.is_prod
24+
engine = "postgres"
25+
engine_version = "17"
26+
instance_class = local.db_instance_class
27+
multi_az = local.db_multi_az
28+
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade", "iam-db-auth-error"]
29+
username = "master"
30+
manage_master_user_password = true
31+
32+
# Networking
33+
vpc_security_group_ids = [aws_security_group.db.id]
34+
iam_database_authentication_enabled = true
35+
db_subnet_group_name = aws_db_subnet_group.bloom.id
36+
37+
# Updates
38+
apply_immediately = true # If false, any changes are applied in the next maintenance window instead of when tofu apply runs.
39+
engine_lifecycle_support = "open-source-rds-extended-support"
40+
allow_major_version_upgrade = false
41+
auto_minor_version_upgrade = true
42+
43+
# Storage
44+
storage_encrypted = true
45+
storage_type = "gp2"
46+
allocated_storage = local.db_start_storage
47+
max_allocated_storage = local.db_max_storage
48+
backup_retention_period = local.db_backup_retention
49+
final_snapshot_identifier = "bloom-db-finalsnapshot"
50+
skip_final_snapshot = !local.is_prod
51+
}
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Create an ECS cluster and services for each Bloom binary.
2+
resource "aws_iam_service_linked_role" "ecs" {
3+
aws_service_name = "ecs.amazonaws.com"
4+
}
5+
resource "aws_ecs_cluster" "bloom" {
6+
region = var.aws_region
7+
name = "bloom"
8+
depends_on = [aws_iam_service_linked_role.ecs]
9+
setting {
10+
name = "containerInsights"
11+
value = "enabled"
12+
}
13+
}
14+
15+
# Set up service discovery for the sites to talk to the api.
16+
resource "aws_service_discovery_http_namespace" "bloom" {
17+
region = var.aws_region
18+
name = "bloom"
19+
description = "Service namespace the bloom services use."
20+
}
21+
22+
# Create logs groups.
23+
locals {
24+
logs_retention = local.is_prod ? 30 : 7
25+
}
26+
resource "aws_cloudwatch_log_group" "task_logs" {
27+
for_each = toset([
28+
"bloom-api",
29+
"bloom-site-partners",
30+
"bloom-site-public"
31+
])
32+
region = var.aws_region
33+
name = each.value
34+
log_group_class = "STANDARD"
35+
retention_in_days = local.logs_retention
36+
}
37+
38+
# Create a secret key used by the API to sign JWTs.
39+
resource "aws_secretsmanager_secret" "api_jwt_signing_key" {
40+
region = var.aws_region
41+
description = "Key used by the Bloom API to sign JWTs"
42+
name_prefix = "bloom-api-jwt-signing-key" # avoids 'you can't create this secret because a secret with this name is already scheduled for deletion' issue when re-deploying an account.
43+
recovery_window_in_days = 7 # minimum
44+
45+
# The provisioner block runs after the resource has been created, and never again.
46+
provisioner "local-exec" {
47+
interpreter = ["/usr/bin/env", "bash", "-c"]
48+
# We need to be very careful that any errors result in a non-zero exit code. Otherwise tofu will
49+
# think this block succeeded and not error.
50+
command = <<-EOT
51+
if ! type -P aws &>/dev/null; then
52+
echo 'ERROR: aws required'
53+
exit 1
54+
fi
55+
if ! type -P openssl &>/dev/null; then
56+
echo 'ERROR: openssl required'
57+
exit 1
58+
fi
59+
if ! type -P tr &>/dev/null; then
60+
echo 'ERROR: tr required'
61+
exit 1
62+
fi
63+
64+
if ! s=$(openssl rand -base64 256 | tr -d '\n'); then
65+
echo 'ERROR: failed to generate random value'
66+
exit 1
67+
fi
68+
69+
if ! aws secretsmanager put-secret-value \
70+
--profile ${var.aws_profile} \
71+
--region ${var.aws_region} \
72+
--secret-id ${self.id} \
73+
--secret-string "$s"
74+
then
75+
echo 'ERROR: failed to put secret value'
76+
exit 1
77+
fi
78+
EOT
79+
}
80+
}
81+
82+
locals {
83+
roles = {
84+
"api" = {
85+
task_execution_policy_extra_statements = [
86+
{
87+
Action = "secretsmanager:GetSecretValue"
88+
Effect = "Allow"
89+
Resource = aws_db_instance.bloom.master_user_secret[0].secret_arn
90+
},
91+
{
92+
Action = "secretsmanager:GetSecretValue"
93+
Effect = "Allow"
94+
Resource = aws_secretsmanager_secret.api_jwt_signing_key.arn
95+
}
96+
]
97+
}
98+
"site-partners" = {
99+
task_execution_policy_extra_statements = []
100+
}
101+
"site-public" = {
102+
task_execution_policy_extra_statements = []
103+
}
104+
}
105+
}
106+
107+
# Create roles for the ECS task executor and the tasks.
108+
resource "aws_iam_role" "bloom_ecs" {
109+
for_each = local.roles
110+
name = "bloom-${each.key}-ecs"
111+
description = "Role the ECS service uses when launching Bloom ${each.key} tasks."
112+
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#create_task_iam_policy_and_role
113+
assume_role_policy = jsonencode({
114+
Version = "2012-10-17"
115+
Statement = [{
116+
Action = "sts:AssumeRole"
117+
Effect = "Allow"
118+
Principal = {
119+
Service = "ecs-tasks.amazonaws.com"
120+
}
121+
Condition = {
122+
ArnLike = {
123+
"aws:SourceArn" = "arn:aws:ecs:${var.aws_region}:${var.aws_account_number}:*"
124+
}
125+
StringEquals = {
126+
"aws:SourceAccount" = var.aws_account_number
127+
}
128+
}
129+
}]
130+
})
131+
}
132+
resource "aws_iam_role_policy" "bloom_ecs" {
133+
for_each = local.roles
134+
name = "bloom-${each.key}-ecs"
135+
role = aws_iam_role.bloom_ecs[each.key].id
136+
policy = jsonencode({
137+
Version = "2012-10-17"
138+
Statement = concat(
139+
[
140+
{
141+
Action = [
142+
"logs:CreateLogStream",
143+
"logs:PutLogEvents",
144+
]
145+
Effect = "Allow"
146+
Resource = "${aws_cloudwatch_log_group.task_logs["bloom-${each.key}"].arn}:log-stream:*"
147+
},
148+
],
149+
each.value.task_execution_policy_extra_statements
150+
)
151+
})
152+
}
153+
resource "aws_iam_role" "bloom_container" {
154+
for_each = local.roles
155+
name = "bloom-${each.key}-container"
156+
description = "Role the Bloom ${each.key} container runs as."
157+
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#create_task_iam_policy_and_role
158+
assume_role_policy = jsonencode({
159+
Version = "2012-10-17"
160+
Statement = [{
161+
Action = "sts:AssumeRole"
162+
Effect = "Allow"
163+
Principal = {
164+
Service = "ecs-tasks.amazonaws.com"
165+
}
166+
Condition = {
167+
ArnLike = {
168+
"aws:SourceArn" = "arn:aws:ecs:${var.aws_region}:${var.aws_account_number}:*"
169+
}
170+
StringEquals = {
171+
"aws:SourceAccount" = var.aws_account_number
172+
}
173+
}
174+
}]
175+
})
176+
}
177+
resource "aws_iam_role_policy" "bloom_container" {
178+
for_each = local.roles
179+
name = "bloom-${each.key}-container"
180+
role = aws_iam_role.bloom_container[each.key].id
181+
policy = jsonencode({
182+
Version = "2012-10-17"
183+
Statement = [{
184+
Action = "*"
185+
Effect = "Deny"
186+
Resource = "*"
187+
}]
188+
})
189+
}

0 commit comments

Comments
 (0)