|
| 1 | +# ADR: Enable SMS Notifications in Strata AWS Infrastructure Template via AWS End User Messaging Phone Number Pool |
| 2 | + |
| 3 | +- **Status:** Proposed (Draft) |
| 4 | +- **Date:** 2025-01-28 |
| 5 | +- **Author:** Johan Robles |
| 6 | +- **Related Ticket:** #976 |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Context and Problem Statement |
| 11 | + |
| 12 | +The Nava Strata AWS Infrastructure template currently supports Email notifications. As part of expanding multi-channel communication capabilities, the team is introducing SMS notifications into the Strata offering. |
| 13 | + |
| 14 | +At the infrastructure layer, this feature leverages AWS End User Messaging (SMS), which introduces several operational and provisioning considerations that impact how SMS is enabled within the Nava Strata Infrastructure Template: |
| 15 | +- Arbitrary phone numbers cannot be used. Phone number registration and approval is a manual AWS-managed external process that may take approximately 1–15 days depending on region and use case. |
| 16 | +- AWS recommends managing approved originators through a Phone Number Pool. Benefits for this solution: |
| 17 | + - Automatic failover if an originator fails - Originator phone numbers are controlled and managed by external phone carriers (not AWS) |
| 18 | + - Rotation of numbers without application code changes |
| 19 | + - Ability to temporarily include simulator numbers for development |
| 20 | +- The current Terraform AWS Provider has limited support for SMS Voice v2 resources: |
| 21 | + - Event Destinations (e.g., `TEXT_DELIVERED`, `TEXT_BLOCKED`, `TEXT_FAILURE`) cannot be fully configured and linked to a Configuration Set using Terraform. Carrier-level delivery events are necessary to understand actual delivery outcomes (which differ from API-level success responses). |
| 22 | + - Phone Number Pools cannot be provisioned using the Terraform AWS Provider. |
| 23 | + - Because of these provider limitations, certain resources must be provisioned using AWS CloudFormation, invoked from Terraform via `aws_cloudformation_stack`. |
| 24 | + |
| 25 | +## Decision Outcome |
| 26 | + |
| 27 | +Implement SMS notification enablement using **Infrastructure-Provisioned Phone Number Pool via CloudFormation definition within Terraform (Option 4)**, including: |
| 28 | + |
| 29 | +1. A new infrastructure module: `notifications-sms` |
| 30 | +2. CloudFormation-managed SMS resources (via Terraform) |
| 31 | +3. A Phone Number Pool module `notifications-phone-pool` with associated phone numbers |
| 32 | +4. Carrier-level delivery event logging to CloudWatch |
| 33 | +5. IAM policies scoped to the Phone Pool ARN (least privilege) |
| 34 | +6. Conditional VPC Interface Endpoint for `sms-voice` |
| 35 | +7. Standardized outputs for application integration |
| 36 | + |
| 37 | +## Considered Options |
| 38 | + |
| 39 | +### Option 1 — Basic SMS Enablement |
| 40 | + |
| 41 | +Provision: |
| 42 | +- VPC Interface Endpoint |
| 43 | +- Configuration Set |
| 44 | +- IAM permission for `sms-voice:SendTextMessage` |
| 45 | + |
| 46 | +**Pros** |
| 47 | +- Simplest implementation |
| 48 | +- Fully supported via Terraform AWS Provider |
| 49 | + |
| 50 | +**Cons** |
| 51 | +- Application teams manage phone number resources |
| 52 | +- Requires broad IAM Access policy permissions |
| 53 | +- No carrier-level delivery visibility |
| 54 | + |
| 55 | +### Option 2 — Add Carrier-Level Delivery Monitoring |
| 56 | + |
| 57 | +Builds on Option 1 and adds: |
| 58 | +- Event destinations for: |
| 59 | + - `TEXT_DELIVERED` |
| 60 | + - `TEXT_BLOCKED` |
| 61 | + - `TEXT_FAILURE` |
| 62 | + - Other asynchronous carrier responses |
| 63 | + |
| 64 | +**Pros** |
| 65 | +- Delivery visibility |
| 66 | +- Enables reliability improvements (e.g., the possibility of adding message retry mechanism) |
| 67 | + |
| 68 | +**Cons** |
| 69 | +- Requires CloudFormation integration which increase implementation complexity |
| 70 | + |
| 71 | +### Option 3 — Infrastructure-Provisioned Single Phone Number |
| 72 | + |
| 73 | +Builds on Option 2 and provisions a single originator number. |
| 74 | + |
| 75 | +**Pros** |
| 76 | +- App teams do not manage phone numbers (just the external registration process) |
| 77 | +- Application IAM Access Policy can be restricted to one number which improves security posture |
| 78 | + |
| 79 | +**Cons** |
| 80 | +- Any testing is blocked until originator phone number approval |
| 81 | +- Originator phone number rotation requires Terraform changes |
| 82 | +- Single number increases operational risk |
| 83 | + |
| 84 | +### Option 4 — Infrastructure-Provisioned Phone Number Pool (Selected) |
| 85 | + |
| 86 | +Builds on Option 2 and provisions: |
| 87 | + |
| 88 | +- Phone Number Pool |
| 89 | +- Associated phone number (When phone number registration is approved) |
| 90 | +- Optional simulator phone number is a provisioned for development purpose |
| 91 | + |
| 92 | +**Pros** |
| 93 | +- Aligns with AWS best practices |
| 94 | +- Supports number rotation without code changes |
| 95 | +- Application IAM Access Policy scoped to pool ARN (least privilege) |
| 96 | +- Simulator Phone Number support for development - no need to wait for originator phone number approval for basic testing. |
| 97 | +- Reduced operational risk - Multiple phone numbers can be added to the pool |
| 98 | + |
| 99 | +**Cons** |
| 100 | +- Higher infrastructure complexity |
| 101 | +- Requires CloudFormation integration |
| 102 | + |
| 103 | +## Rationale |
| 104 | + |
| 105 | +Option 4 provides: |
| 106 | + |
| 107 | +- Strong security posture through least-privilege IAM |
| 108 | +- Improved operational resilience via number pooling |
| 109 | +- Carrier-level delivery observability |
| 110 | +- Development/testing flexibility via simulator phone number |
| 111 | +- Alignment with AWS best practices |
| 112 | + |
| 113 | +It balances reliability, security, and observability while managing Terraform provider limitations. |
0 commit comments