-
Notifications
You must be signed in to change notification settings - Fork 13
Cross‐Region Replication and Disaster Recovery Design Report
Scope: Users data structure → replication method selection → disaster recovery (with authentication continuity)
Primary constraints: GDPR residency (EU user data must not leave EU), Users table is referenced by many tables (FK integrity), mixed authentication, region-unique IDs (SID-… vs EU-…).
Our platform operates on two PostgreSQL instances: Virginia and Ireland, across multiple environments (dev, qa, stage, prod). We want operational continuity with multi-region resilience, but our system contains regulated personal data. The main requirement is that EU user personal data must not leave the EU. At the same time, our data model heavily references users.user_id from many tables. This means we cannot treat users as “optional” in replication without causing integrity or usability issues in downstream tables.
Our framework revolves around the following question:
- How do we represent users in each region so that referential integrity is preserved without replicating restricted PII?
- Which replication mechanism allows us to selectively control what moves across regions (table/column-level)?
- How do we provide disaster recovery behavior where Virginia users can still authenticate in Ireland if Virginia is down, without replicating full profiles?
Our Virginia table looks like:
CREATE TABLE IF NOT EXISTS virginia_dev_saayam_rdbms.users (
user_id VARCHAR(255) PRIMARY KEY,
full_name VARCHAR(255) NULL,
first_name VARCHAR(255) NULL,
last_name VARCHAR(255) NULL,
primary_email_address VARCHAR(255) NULL,
primary_phone_number VARCHAR(255) NULL,
addr_ln1 VARCHAR(255) NULL,
city_name VARCHAR(255) NULL,
zip_code VARCHAR(255) NULL,
last_location point,
time_zone VARCHAR(255) NULL,
profile_picture_path VARCHAR(255) NULL,
gender VARCHAR(255) NULL,
language_1 VARCHAR(255) NULL,
language_2 VARCHAR(255) NULL,
language_3 VARCHAR(255) NULL,
promotion_wizard_stage INT NULL,
promotion_wizard_last_update_date TIMESTAMP,
last_update_date TIMESTAMP,
...
);In a single table, we mix authentication identifiers (email), profile PII (names, phone, address, location), and operational attributes (timezone, wizard fields). If we replicate this table as it is, we replicate restricted fields. If we exclude the table entirely, we break foreign keys and destroy the operational usefulness of replicated data. So the table schema is forcing us into a selective replication strategy.
Our user_id is not a plain integer key. Instead, Virginia generates SID-... IDs via sequence and trigger:
CREATE SEQUENCE virginia_dev_saayam_rdbms.user_id_seq START WITH 1 INCREMENT BY 1;
CREATE FUNCTION virginia_dev_saayam_rdbms.generate_sid()
RETURNS TRIGGER AS $$
DECLARE
seq_id INT;
new_id VARCHAR(20);
BEGIN
seq_id := nextval('user_id_seq');
new_id := 'SID-00-' || LPAD(FLOOR(seq_id / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
NEW.user_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_insert_users
BEFORE INSERT ON virginia_dev_saayam_rdbms.users
FOR EACH ROW EXECUTE FUNCTION virginia_dev_saayam_rdbms.generate_sid();Ireland uses a separate generator that produces EU-.... This prefix gives us an embedded “region namespace.” It means our Ireland database can safely store SID-... rows alongside EU-... rows with no primary key collisions. Without this, we would have needed offset ranges, odd/even sequences, or a full UUID migration.
What it is: Replicating all operational tables (Orders, Logs) but skipping the users table.
Why it failed: It breaks Referential Integrity. If an Order for user_id: SID-123 is replicated to Ireland, but the users table in Ireland has no record of SID-123, the database will throw a Foreign Key (FK) error.
Decision: Rejected. It leads to orphaned rows and breaks all analytics and support features in the secondary region.
PostgreSQL Native Logical Replication is a database engine-level feature that uses the Publication/Subscription model, where:
- Virginia RDS acts as the publisher (source)
- Ireland RDS acts as the subscriber (target)
- Changes flow through PostgreSQL's Write-Ahead Log (WAL) in real-time
- Works at table-level granularity only - cannot filter individual columns
Key Limitation: You cannot replicate the users table and exclude specific columns. It's all-or-nothing per table.
The Core Challenge GDPR Compliance:** Since you cannot filter columns natively, you must architect around this limitation by creating a "shadow table" or "stub table" in Virginia:
Virginia RDS:
├── users (full PII - NOT replicated)
├── users_global_stub (only user_id - REPLICATED)
├── orders (REPLICATED)
└── transactions (REPLICATED)
↓ (Logical Replication)
Ireland RDS:
├── users (EU users only - full PII)
├── users_global_stub (SID users - minimal)
├── orders (both EU and SID)
└── transactions (both EU and SID)
AWS Cognito (Global):
└── Handles ALL authentication for both regions
The shadow table contains only the "safe" columns, but none of the PII fields like name, address, or phone.
Every time a Virginia user updates their profile.
- The main users table is updated
- A trigger on the users table fires
- The trigger updates the corresponding row in users_global_stub
- PostgreSQL's logical replication then sends the users_global_stub change to Ireland
- Ireland receives only the filtered data
This means you're maintaining two versions of user data in Virginia at all times, synchronized via triggers.
PostgreSQL 10+ (both regions), permission to modify wal_level and other replication parameters, REPLICATION privilege, pglogical extension (bidirectional), Database migration tool, Amazon CloudWatch
Phase 1: Virginia Configuration (Source Database)
Step 1.1: Enable Logical Replication by Parameter Group
AWS RDS Console:
1. Navigate to Parameter Groups
2. Create parameter group
- Family: postgres14 (or your version)
- Name: virginia-logical-replication
3. Edit parameters:
- wal_level = logical
- max_wal_senders = 10
- max_replication_slots = 10
- max_wal_size = 2048 (MB)
4. Modify Virginia RDS instance to use this parameter group
5. Reboot the instance (required for wal_level change)
OR
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
-- Restart the database instance
-- (AWS RDS: Reboot the instance from console)Step 1.2: Create the Shadow Table
CREATE TABLE virginia_dev_saayam_rdbms.users_global_stub (
user_id VARCHAR(50) PRIMARY KEY,
user_status_id INTEGER DEFAULT 1,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Step 1.3: Create the Synchronization Trigger
Function to sync (INSERT, UPDATE, DELETE) the main users table to the stub table
Step 1.4: Initial Data Sync
Populate stub table with existing users and remember to verify
Step 1.5: Create the Replication User and Publication
-- Create dedicated replication user
CREATE USER replication_user WITH REPLICATION PASSWORD 'your-secure-password-here';
GRANT CONNECT ON DATABASE virginia_prod TO replication_user;
GRANT USAGE ON SCHEMA virginia_dev_saayam_rdbms TO replication_user;
GRANT SELECT ON ALL TABLES IN SCHEMA virginia_dev_saayam_rdbms TO replication_user;
-- Create publication - NOTE: users_global_stub, NOT users
CREATE PUBLICATION global_pub FOR TABLE
virginia_dev_saayam_rdbms.orders,
virginia_dev_saayam_rdbms.users_global_stub,
virginia_dev_saayam_rdbms.transactions,
virginia_dev_saayam_rdbms.logs,
virginia_dev_saayam_rdbms.tickets;
-- Should NOT see 'users' in output
SELECT * FROM pg_publication_tables WHERE pubname = 'global_pub';Phase 2: Ireland Configuration (Target Database)
Setup Network Connectivity
AWS Console:
1. Create VPC Peering Connection between Virginia and Ireland VPCs
2. Update route tables in both regions
3. Update Security Groups:
- Ireland RDS: Allow inbound 5432 from Virginia RDS
- Virginia RDS: Allow outbound 5432 to Ireland RDS
Step 2.1: Prepare the Target Schema
Ensure Ireland's users table can accept stub rows by creating a Table Mapping trigger. Since Virginia is replicating users_global_stub, but Ireland needs it in the users table. For that, create a local stub table to receive replicated data and then a trigger to merge stub data into the main users table in the Ireland region.
ALTER TABLE ireland_rdbms.users ALTER COLUMN full_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN primary_phone_number DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN addr_ln1 DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN city_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN zip_code DROP NOT NULL;
CREATE TABLE ireland_rdbms.users_global_stub (
user_id VARCHAR(50) PRIMARY KEY,
user_status_id INTEGER DEFAULT 1,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Step 2.2: Harden the ID Generator Trigger
Prevent the EU generator from overwriting incoming SID IDs
-- Prevent EU ID generator from overwriting SID IDs
CREATE OR REPLACE FUNCTION ireland_rdbms.generate_eu_if_missing()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.user_id IS NOT NULL THEN
RETURN NEW; -- Preserve SID-xxx from replication
END IF;
NEW.user_id := 'EU-00-' || nextval('user_id_seq')::TEXT;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_insert_users
BEFORE INSERT ON ireland_rdbms.users
FOR EACH ROW
EXECUTE FUNCTION ireland_rdbms.generate_eu_if_missing();
CREATE TRIGGER before_insert_users_stub
BEFORE INSERT ON ireland_rdbms.users_global_stub
FOR EACH ROW
EXECUTE FUNCTION ireland_rdbms.generate_eu_if_missing();Step 2.3: Create the Subscription
-- Create subscription to Virginia publication
CREATE SUBSCRIPTION global_sub
CONNECTION 'host=virginia000.com
port=5432
dbname=virginia_prod
user=replication_user
password=your-secure-password-here
sslmode=require'
PUBLICATION global_pub
WITH (copy_data = true);
SELECT * FROM pg_stat_subscription;
-- Should show: subname='global_sub', pid (active process), received_lsn (incrementing)Scenario: Virginia Region Goes Down
Normal Operations - Before Outage
Status:
- Virginia RDS: ✅ Active (serving SID users)
- Ireland RDS: ✅ Active (serving EU users)
- Cognito: ✅ Global (serves both regions)
- Replication: ✅ Virginia → Ireland (lag ~2-5 seconds)
User Flow:
SID-123 user → Cognito login → Virginia app → Virginia RDS (full profile)
EU-456 user → Cognito login → Ireland app → Ireland RDS (full profile)
What Happens Automatically when Virginia goes down:
- 10:00:00 AM - Virginia RDS becomes unreachable
- Replication stops (last data received at 09:59:58 AM)
- Virginia application errors (cannot connect to the database)
- Route53 health check fails
- Automatic DNS failover begins (if configured)
DR Failover:
- Update DNS and enable DR mode in Application by routing Traffic to Ireland
- User Login During Outage managed at the Application and UI level
Shows dashboard with limitations:
✅ Can view requests
✅ Can create new requests(writes to Ireland)
⚠️ Profile shows "Limited Mode" (no name/address visible)
⚠️ Cannot edit profile
- Monitor Replication Status
- What Cognito Provides:
- User can query Cognito directly to get email for display if needed
- Password changes still work (Cognito handles it)
- OAuth logins still work (Google, Yahoo)
During Outage
All the requests made or generated by this end are to be synced back to Virginia when it recovers.
Failback: Returning to Virginia
Step 1: Verify Virginia Health
Step 2: Identify Data to Sync Back
Go by this snippet with the WHERE clause to make only the details of the Virginia region are synced back
WHERE user_id LIKE 'SID-%' AND created_at BETWEEN '2025-12-18 10:00:00' AND '2025-12-18 14:00:00';
The Problem:
This new data exists in Ireland, but NOT in Virginia
PostgreSQL logical replication is unidirectional by default—it doesn't have built-in conflict resolution.
Manual Failback Process:
- Step 1: Stop Ireland Writes
- Step 2: Extract Changed Data - Export them
- Step 3: Manually Sync Back to Virginia - Import back or use pg_dump
- Step 4: Verify Sync Integrity and Resume Normal Operations by disabling DR operations in System Manager
- Step 5: Resume Traffic to Virginia
The Risk:
Human error in the manual sync script can cause data loss or corruption. There's no automatic conflict detection if both regions modified the same record differently.
Ongoing Maintenance Burden:
Every time we add a column to users in, we must always:
- Decide if it's PII (should it be replicated?)
- If safe to replicate: Add it to users_global_stub
- Update the sync trigger to include the new column
- Update the Ireland merge trigger
Every time we add a new table to the database, we must always:
- Alter the publication
- Create an identical table in Ireland
AWS Database Migration Service (DMS) is a fully managed replication service that:
- Sits between Virginia and Ireland RDS as a middle layer
- Reads changes from Virginia RDS using Change Data Capture (CDC)
- Applies transformation rules to filter/remove columns in transit
- Writes filtered data to Ireland RDS
Key Advantage: DMS can remove specific columns before they reach Ireland - solving the GDPR problem without shadow tables or triggers.
Virginia RDS:
├── users (full PII - DMS reads this directly)
├── orders
└── transactions
↓
[DMS Instance]
- Reads from Virginia
- Removes: full_name, phone, address (via JSON rules)
- Keeps: user_id, user_status_id
↓
Ireland RDS:
├── users (EU users + SID stubs with filtered data)
├── orders
└── transactions
AWS Cognito (Global):
└── Handles ALL authentication for both regions
What we must do:
- Create a DMS Replication Instance (managed compute for replication)
- Configure Source Endpoint (Virginia RDS)
- Configure Target Endpoint (Ireland RDS)
- Define Table Mapping JSON with transformation rules (your privacy filter)
- DMS handles everything else automatically
What DMS Does Automatically:
- Creates a replication slot in Virginia
- Tracks CDC position with checkpoints
- Filters columns based on JSON rules
- Handles initial data load + continuous replication
- Provides built-in failback capability (reverse replication)
- Logs all operations to CloudWatch
The Big Difference from Option A:
- No shadow tables - replicate the actual users table
- No triggers - DMS handles sync automatically
- Column-level filtering - JSON rules remove PII in transit
- Managed failback - create a reverse DMS task for syncing back
AWS Services
- Amazon RDS for PostgreSQL 10+ (both regions)
- AWS DMS Replication Instance (compute for replication engine)
- Instance class: dms.c5.large to dms.c5.4xlarge (based on data volume)
- Storage: 100GB minimum for CDC caching
- AWS Cognito User Pool (single global pool accessible from both regions)
- Amazon CloudWatch (automatic - DMS logs and metrics)
- AWS Route 53 (DNS failover between regions)
RDS Configuration Requirements
- Virginia RDS: Parameter group with wal_level = logical
- Ireland RDS: Standard configuration (no special parameters)
- Security Groups: DMS instance must reach both RDS instances (port 5432)
- VPC Peering/Transit Gateway: If DMS instance in different VPC
Network Requirements
- DMS instance needs network path to both Virginia and Ireland RDS
- Typically place DMS instance in Virginia region
- Use VPC peering or Transit Gateway for cross-region connectivity
Phase 1: Virginia Database Preparation
Step 1.1: Enable Logical Replication
-- Enable WAL logical decoding (requires restart)
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
-- For RDS: Modify parameter group, then reboot instanceOR
AWS RDS Console:
1. Parameter Groups → Create parameter group
- Family: postgres14
- Name: virginia-dms-source
2. Edit parameters:
- wal_level = logical
- max_wal_senders = 10
- max_replication_slots = 10
3. Modify Virginia RDS instance to use this parameter group
4. Reboot instance (required)
Step 1.2: Create DMS Replication User
-- Connect to Virginia RDS
CREATE USER dms_user WITH PASSWORD 'secure-password-123';
GRANT rds_replication TO dms_user;
GRANT CONNECT ON DATABASE virginia_prod TO dms_user;
GRANT USAGE ON SCHEMA virginia_dev_saayam_rdbms TO dms_user;
GRANT SELECT ON ALL TABLES IN SCHEMA virginia_dev_saayam_rdbms TO dms_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA virginia_dev_saayam_rdbms
GRANT SELECT ON TABLES TO dms_user;Phase 2: Ireland Database Preparation
Step 2.1: Adjust Schema Constraints - these won't be sent by DMS
ALTER TABLE ireland_rdbms.users ALTER COLUMN full_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN first_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN last_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN primary_phone_number DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN addr_ln1 DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN city_name DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN zip_code DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN profile_picture_path DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN gender DROP NOT NULL;
ALTER TABLE ireland_rdbms.users ALTER COLUMN date_of_birth DROP NOT NULL;Step 2.2: Harden ID Generator Trigger
CREATE OR REPLACE FUNCTION ireland_rdbms.generate_eu_if_missing()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.user_id IS NOT NULL THEN
RETURN NEW; -- Preserve SID-xxx from DMS
END IF;
NEW.user_id := 'EU-00-' || nextval('user_id_seq')::TEXT;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_insert_users
BEFORE INSERT ON ireland_rdbms.users
FOR EACH ROW EXECUTE FUNCTION ireland_rdbms.generate_eu_if_missing();Step 2.3: Create Target User for DMS
CREATE USER dms_target_user WITH PASSWORD 'secure-password-456';
GRANT CONNECT ON DATABASE ireland_prod TO dms_target_user;
GRANT USAGE ON SCHEMA ireland_rdbms TO dms_target_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA ireland_rdbms TO dms_target_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA ireland_rdbms TO dms_target_user;
Phase 3: AWS DMS Setup
Step 3.1: Create Replication Instance
Via AWS Console:
- Navigate to AWS DMS Console
- Click "Replication instances" → "Create replication instance"
- Configure:
- Name: virginia-to-ireland-replication
- Instance class: dms.c5.xlarge (choose based on data volume)
- Engine version: 3.5.2 or latest
- Allocated storage: 100 GB (auto-scales if needed)
- VPC: Select your VPC
- Multi-AZ: Yes (for production)
- Publicly accessible: No
- Click "Create"
Via AWS CLI:
aws dms create-replication-instance \
--replication-instance-identifier virginia-ireland-repl \
--replication-instance-class dms.c5.xlarge \
--engine-version 3.5.2 \
--allocated-storage 100 \
--vpc-security-group-ids sg-0123456789abcdef0 \
--availability-zone us-east-1a \
--multi-az \
--no-publicly-accessible
The instance needs at least 10 minutes to be active.
Step 3.2: Create Source Endpoint (Virginia) and test it
aws dms create-endpoint \
--endpoint-identifier virginia-source \
--endpoint-type source \
--engine-name postgres \
--server-name virginia000.amazonaws.com \
--port 5432 \
--database-name virginia_prod \
--username dms_replication_user \
--password 'secure_password_123' \
--ssl-mode require \
--postgres-settings '{
"PluginName": "pglogical",
"SlotName": "dms_cdc_slot"
}'
Test endpoint connection → Select replication instance → Run test → Success
Step 3.3: Create Target Endpoint (Ireland)
aws dms create-endpoint \
--endpoint-identifier ireland-target \
--endpoint-type target \
--engine-name postgres \
--server-name ireland000.amazonaws.com \
--port 5432 \
--database-name ireland_prod \
--username dms_target_user \
--password 'secure_password_456' \
--ssl-mode require
Test endpoint connection → Success
Step 3.4: Create the Table Mapping (Privacy Filter)
Save this as dms-mapping.json:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "include-all-tables",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "%"
},
"rule-action": "include"
},
{
"rule-type": "transformation",
"rule-id": "10",
"rule-name": "remove-full-name",
"rule-target": "column",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "users",
"column-name": "full_name"
},
"rule-action": "remove-column"
},
{
"rule-type": "transformation",
"rule-id": "11",
"rule-name": "remove-first-name",
"rule-target": "column",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "users",
"column-name": "first_name"
},
"rule-action": "remove-column"
},
{
"rule-type": "transformation",
"rule-id": "12",
"rule-name": "remove-last-name",
"rule-target": "column",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "users",
"column-name": "last_name"
},
"rule-action": "remove-column"
},
{
"rule-type": "transformation",
"rule-id": "13",
"rule-name": "remove-phone",
"rule-target": "column",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "users",
"column-name": "primary_phone_number"
},
"rule-action": "remove-column"
}
.
.
.
IMPORTANT: This JSON is our compliance boundary. Columns NOT listed will replicate (user_id, user_status_id, created_at, updated_at). Everything explicitly removed will never reach Ireland.
Step 3.5: Create Replication Task
Task identifier: virginia-to-ireland-task
Replication instance: virginia-to-ireland-repl
Source endpoint: virginia-source
Target endpoint: ireland-target
Migration type: Migrate existing data and replicate ongoing changes
Task settings:
- Target table preparation: Do nothing
- Stop task after full load: No
- Enable CloudWatch logs: Yes
- Enable validation: Yes (optional, adds 5-10% overhead)
Table mappings:
- Wizard mode: Switch to JSON editor
- Paste the JSON from Step 3.4
Migration task startup: Automatically start on create
Create task → Wait for status "Running" (full load then CDC)
Step 3.6: Monitor inital load
DMS Console → Task: virginia-to-ireland-task → Table statistics
Monitor:
- Full load progress (% complete per table)
- Row counts
- CDC latency (should be < 10 seconds once full load completes)
CloudWatch Logs:
- Check for transformation logs showing columns being removed
Scenario: Virginia Region Outage
Normal Operations
Virginia RDS: ✅ Active (serving SID users)
DMS: ✅ Replicating with ~2-5 second lag
Ireland RDS: ✅ Receiving filtered data
Cognito: ✅ Global (both regions authenticate)
User Flow:
SID-123 user → Cognito login → Virginia app → Virginia RDS (full profile)
EU-456 user → Cognito login → Ireland app → Ireland RDS (full profile)
Outage Begins
Virginia PostgreSQL: ❌ Unreachable
DMS Replication Instance:
Ireland PostgreSQL: ✅ Fully operational
Cognito continues working (global service)
What Ireland Has:
All SID user stubs (user_id + status only, no PII)
All requests and notifications with valid foreign keys
All EU users with full profiles
Failover Actions
Step 1: Route Traffic to Ireland
- Update Route53 DNS: api.yourapp.com → Ireland ALB
- Or let health check failover trigger automatically
- Traffic now hits the Ireland application
Step 2: Enable Application DR Mode
- Update Systems Manager Parameter: /app/dr_mode = "true"
- Application detects SID users and shows limited profile mode
Step 3: Verify Data Integrity
- Check DMS Console for last successful replication timestamp
- Verify FK integrity in the Ireland database
- Confirm user authentication works via Cognito
User Experience:
✅ Login works (Cognito)
✅ View requests
✅ Create new request(writes to Ireland)
Virginia Recovers
Step 1: Verify Virginia Health
- Check RDS is responsive
- Verify database integrity
- Confirm tables are accessible
Step 2: Resume Forward Replication
- DMS Console → Task: virginia-to-ireland-task → Resume
- DMS automatically resumes from checkpoint (09:59:58 AM)
- Catches up on any changes in Virginia during recovery
Step 3: Create Reverse Replication (Failback)
We can do a manual process as listed in the Recovery of Virginia Region in Option A, but since we are using DMS, it gives us extra support.
- DMS Console → Create new task: ireland-to-virginia-failback
- Source: ireland-target, Target: virginia-source
- Migration type: CDC only
- CDC start time: 2025-12-18 10:00:00 (outage start)
- Table mapping: Filter for SID users only
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "sync-sid-data",
"object-locator": {
"schema-name": "ireland_rdbms",
"table-name": "%"
},
"rule-action": "include",
"filters": [
{
"filter-type": "source",
"column-name": "user_id",
"filter-conditions": [
{
"filter-operator": "ste",
"value": "SID-%"
}
]
}
]
}
]
}Step 4: Run Failback Task
- Start task and monitor progress
- DMS syncs all SID-related data created during the outage
- Verify row counts match between regions
Step 5: Resume Normal Operations
- Stop the reverse task
- Update Route53 → Virginia
- Disable DR mode in the application
- Monitor forward replication for 1 hour
Ongoing Maintenance
Adding New PII Column:
Step 1: Add a column to the users table in both regions
Step 2: Add a new transformation rule to table-mapping.json:
{
"rule-type": "transformation",
"rule-id": "24",
"rule-name": "remove-new-pii-column",
"rule-target": "column",
"object-locator": {
"schema-name": "virginia_dev_saayam_rdbms",
"table-name": "users",
"column-name": "new_pii_column"
},
"rule-action": "remove-column"
}
Step 3. DMS Console → Modify task → Update table mappings
Step 4. Resume task
Adding Non-PII Column:
Just add to both databases. DMS automatically replicates it (not in the remove list)
Adding New Table:
- Create a table in both regions
- Already included by the "include all tables" rule
- DMS starts replicating automatically
Schema Changes:
Most DDL changes are replicated automatically by DMS. Complex changes (like splitting columns) may require a task restart
Monitoring
CloudWatch Metrics to Track:
- CDCLatencySource - replication lag (alert if > 60 seconds)
- FullLoadThroughputRowsSource - during initial load
- CPUUtilization - DMS instance health
DMS Console Monitoring:
- Table statistics (row counts, inserts/updates/deletes)
- Task status (running, stopped, error)
- Validation results (if enabled)
Note: This method performs unidirectional replication by default. That means, Ireland's local data does NOT automatically replicate back to Virginia. Either we have to opt for manual sync, as I have mentioned before, or go with the Reverse DMS task, which is temporary.
Application Configuration:
- Configure Cognito User Pool with custom attribute: custom:user_id
- During registration: Generate user_id → Store in Cognito + Database
- During login: Extract user_id from JWT → Query database
- During DR: Check user_id prefix (SID vs EU) → Enable appropriate mode
Cognito During DR:
- Authentication continues working (global service)
- Password resets work
- OAuth providers (Google, Yahoo) work
- MFA continues functioning
- No database dependency for authentication
DMS Costs:
- Replication Instance (dms.c5.xlarge): ~.384/hour = ~80/month
- Data transfer (cross-region): ~.02/GB
- CloudWatch logs: ~0/month
- Total Monthly Cost: ~$350-400
Compare to Option A:
- No AWS service costs
- But ~8-12 hours/month engineering time for maintenance
- Risk of manual failback errors
| Feature | Option A: Native Replication | Option B: AWS DMS |
|---|---|---|
| Setup Complexity | High (shadow tables + triggers) | Medium (DMS configuration) |
| Column Filtering | Manual (shadow table) | Native (JSON rules) |
| Maintenance | High (trigger updates on schema changes) | Low (DMS handles most) |
| Failback | Manual (export/import scripts) | Automated (reverse DMS task) |
| Failback Time | ~60-90 minutes | ~30-45 minutes |
| Audit Trail | Database triggers (hard to audit) | CloudWatch logs (clear trail) |
| Cost | $0 AWS services | ~$400/month AWS services |
| Risk | High (manual processes) | Low (managed service) |
| Flexibility | PostgreSQL only | Multi-engine support |