-
Notifications
You must be signed in to change notification settings - Fork 14
** Architectural Decision Record: User Identity & Multi‐Region Data Residency
The platform Saayam For All operates across three geographic regions:
Virginia, USA - the primary/base region
Ireland, EU - the EU-specific region
Stocklohm, EU - the second EU-specific region
The platform manages beneficiaries and volunteers, and helps with requests globally. The core schema challenge was how to build a unified global system while ensuring EU user data complied with GDPR and data residency laws.
Existing Setup:
- A user's table in Virginia with PII (name, email, phone, address, etc.)
- A sequence generator generate_sid() that creates IDs in the format SID-00-XXX-XXX-XXX
- Virginia handles Non-EU users; Ireland handles EU users (SID-EU-XXX-XXX-XXX)
- All other tables (requests, actions, skills, volunteers_assigned, etc.) referenced user_id as a foreign key
The Problem:
The existing user_id is both a database primary key and a regional identity marker. There was no concept of a "global" identifier that could safely travel across regions without carrying PII or violating compliance rules.
Issue 1 — Data Residency & GDPR: EU user PII (name, email, address) cannot legally reside in a US-based database. If you store an EU user's record in Virginia, you're violating data residency rules. But if Virginia can't see EU user data, how does it process help requests for those users?
Issue 2 — Foreign Key Constraints: All operational tables (requests, actions, comments) use user_id as a foreign key. If an EU user's user_id lives only in Ireland, Virginia can't link a help request to that user without breaking referential integrity.
Issue 3 — The Sequence Generator Clash Virginia's sequence generates SID-00-... and Ireland's generates SID-EU-.... You cannot insert one region's ID format into the other region's tables without breaking the sequence format validation.
Issue 4 — Disaster Recovery (DR) If Virginia goes down, Ireland must temporarily register all users (including Non-EU ones). But Ireland's sequence generator doesn't know where Virginia stopped. When Virginia comes back, how do you sync without creating duplicate IDs?
Let's think of dealing with these core issues one by one.
Approach 1: Direct Unique ID
Just create a unique ID and insert it across all regions - no restriction on the ID, so that it can live anywhere.
But it doesn't address how the ID is formatted to satisfy the existing sequence generators, how the foreign key constraints on user_id are handled. Vague - doesn't specify what happens structurally to dozens of tables.
Approach 2: Composite Primary Key (global_id, user_id) in the Users table
Alter all operational tables to use a combination of two keys - the new global ID and the existing regional user_id. For EU users in Virginia, the user_id field would be filled with a static placeholder like EXTERNAL_EU. Keeps database constraints satisfied. Virginia can store EU user requests using the composite key.
But it requires altering every operational table in every region - massive DDL refactoring. The placeholder string EXTERNAL_EU is messy; it doesn't play well with typed sequence generators. Every query now requires awareness of whether the row is a "real" user or a "placeholder" user, and adds complexity to joins and debugging.
Approach 3: Shadow Profile Approach
When an EU user is created, Virginia's sequence generator still runs and creates a "shadow" user record (e.g., SID-00-001) — but this record contains zero PII. It only holds the user_id and a global_id column pointing to Ireland. All existing tables continue using user_id as normal. Zero changes to existing table structures. All foreign key relationships stay intact. The system functions exactly as it does today.
But if the shadow record gets out of sync with Ireland, data consistency breaks silently.
Approach 4: Complete Decoupling (Microservices Way)
Remove user_id as a foreign key from all global/operational tables entirely. Replace it with global_id. Keep user_id only in PII-heavy regional tables. Clean separation. Global tables do not know regional sequences. Virginia can accept records for any user in the world without needing their profile stored locally.
Most invasive refactor — foreign key constraints must be dropped from many tables. Moves validation from the database layer to the application layer, increasing application-side risk, and requires careful handling to avoid "orphan" records in operational tables.
Approach 5: The Global ID (GID) in a new table called global_users mapping table
A GID generated at the Volunteer Microservice level (not in the database). This is the universal token that represents a user across all regions. Contains zero PII. Is not tied to any sequence generator. Can safely live in Virginia, Ireland, or anywhere without compliance issues. Is generated before the database transaction begins, so it's available to insert into both users and the mapping table atomically.
Why not generate it in the database? If you use a DB trigger to generate the GID, you create a hidden dependency. If you ever change databases, migrate schemas, or debug a DR incident, you'd need to account for that trigger. Service-layer generation is transparent and database-agnostic.
But what about the situations of disaster recovery? How are we handling the sync process? There has to be a clear line of flow for the administrators and engineers, even before the implementation of the schema. There is no rethinking of maybe.
For every operational record, there is a user_id link to connect to the details of that particular user. There is no possible way to store the record if that user_id isn't present anywhere for the link in the instance. Either we have to create a table (users, in our case) that has all the user_id of all the instances, even if the other attributes aren't stored, or have a globally recognized identifier.
The existing generate_sid() sequence logic is:
us-east-1: Virginia region generates SID-00-XXX-XXX-XXX
eu-west-1: Ireland region generates SID-EU-XXX-XXX-XXX
In a normal execution, the user_id would be created with their respective sequence generators in their respective instance. Then what happens in a disaster? How will the new traffic from Virginia to Ireland be handled? The answer to this is that the Ireland region always contains the Virginia user data, so the user creation to request handling of an existing Virginia user can be handled.
But what about the new Virginia users created in the Ireland region? One of the best approaches is having a dual sequence generator in the Ireland region. The existing generate_sid() sequence logic is kept intact but made smarter.
We can include an is_eu flag in the existing users table that is given by the volunteer microservice, which comes with the user details that need to be inserted.
In the generator of the Ireland region:
If is_eu = TRUE, it generates SID-EU-XXX-XXX-XXX
If is_eu = FALSE, it generates SID-00-XXX-XXX-XXX
This avoids creating the user record as an EU type, and when the Virginia region is up again, during the sync up within that timeline, all the newly created Virginia users in Ireland will be copied to the Virginia region, and if any new records of Ireland users are created, their id's need to be figured out! The UID remains the regional primary key for PII-heavy tables.
But there is an issue here: how would the database know what the last countdown for the Virginia user who was just created was when Virginia is down? If we don't know the last count, the sequence generator in Ireland would start from the beginning, creating duplicate records during sync-up.
To prevent ID collisions during a disaster (when Ireland must register non-EU users), each region's sequence can be pre-allocated to a non-overlapping range.
Virginia: sequence starts at 1–1,000,000,000,000 (1 trillion)
Ireland: sequence starts at 2,000,000,000,000-n (2 trillion)
This means,
Virginia users in Virginia: SID-00-001-XXX-XXX
Virginia users in Ireland (registered during DR): SID-00-002-XXX-XXX
They can never collide, no matter how long Virginia is offline
The generate_sid() function's math also needs to be updated to support trillion-scale ranges by increasing the base divisors from 1,000,000 to 1,000,000,000.
This clears the creation of Virginia users in the Ireland region during a disaster. After the issue, during the sync-up, the Virginia users are copied back to the Virginia region using the Sequence Range Partitioning concept.
Now coming back to Issue 1, if we entirely remove the user_id creation and have GID creation at the microservice level, all the regionally dependent tables would be left alone, need a different method to differentiate the users.
That's why we are considering adding only the user_id of the Ireland region in the Virginia region during disaster for our schema-level implementation, considering the schema ID is not PII data that comes into GDPR. This would simplify the process, eliminating issue 2, backed by the sequence generators, instead of going around on the GID, their creation and maintenance.
NOTE: The same concept can be applied to Ireland Disaster Recovery process. When the Ireland region is down, all the newly created users' records should be stored in the Stocklohm first. Instead of having a dual sequence generator, we can have an Ireland region sequence generator with the Sequence Range Partitioning concept.
We are not going to remove the user_id as the primary key. Keep the user_id as our key point with an is_eu flag for support.
In the Virginia region, the user_id will have a sequence generator that creates the non-EU users. And all the Ireland region users' user_ids will be stored in the Virginia Users' table for the operational table to work. When the Virginia region is down, the new Virginia users are created in the Ireland region, which has a dual sequence generator and a different starting count of user_ids. When the region is up again, the details are copied back to the Virginia region. Any newly created Ireland users' user_ids will also be synced. When the Ireland region is down, the Stocklohm region created the EU users with a different starting count of EU IDs. When the region is up, all the EU user records created would be copied to the Ireland region, and their IDs are stored in the Users table of the Virginia region.
The is_eu check is really only meaningful during user creation routing, and that routing decision is made by the Volunteer Microservice — not the database.
Normal operation: non-EU 'A' user
- The volunteer team gets the user data with the region confirmation.
- Service checks is_eu = FALSE → goes for DB connection - Virginia
- Opens a database transaction (BEGIN)
- Inserts 'A' into the users table → trigger generates user_id and flag as FALSE.
- Commits (COMMIT)
- During the sync-up, this record is stored in the Ireland region and the Stocklohm region at their own Asynchronous Replication.
Normal operation: EU 'B' user
- The volunteer team gets the user data with the region confirmation.
- Service checks is_eu = TRUE → goes for DB connection - Ireland
- Opens a database transaction (BEGIN)
- Inserts 'A' into the users table → trigger generates user_id and flag as TRUE.
- Commits (COMMIT)
- Insert the user_id created in Ireland into the Virginia users table using API/ another transaction, maintaining the CAP theorem.
- During the asynchronous replication, this record is stored in the Stocklohm region.
Virginia goes offline: non-EU 'C' user
- The volunteer team gets the user data with the region confirmation.
- Service checks is_eu = FALSE → goes for DB connection - Ireland as Virginia is unreachable
- Opens a database transaction (BEGIN)
- Inserts 'A' into the users table with Ireland's trigger generating SID-00-002-XXX (from its 2-trillion range)
- Commits (COMMIT)
- During the asynchronous replication, this record is stored in the Stocklohm region.
- When the Virginia region is up, sync back the record to the Virginia region.
Based on the full DDL list, every table was classified into three categories:
Static Reference tables (no user link needed): country, state, city, help_categories, request_status, request_priority, sla, notification_types, identity_type, and others — these are global lookup tables with no user identity.
PII related (stay regional, contain PII or sensitive operational data): user_locations, volunteer_locations, user_availability, volunteer_applications, volunteer_details, user_additional_details, user_signoff, volunteer_rating, user_notification_preferences, user_notification_status, notifications
Operational tables (global operational tables, cross-region safe): request, action, volunteers_assigned, user_skills, org_skills, help_categories_map, user_category, fraud_requests, req_add_info, req_comments, request_for, request_isleadvol, user_org_map, organizations, volunteer_organizations
This is the most critical change.
Virginia: This table will now serve as a "Global Registry" of sorts. While it only stores PII for non-EU users, it will now also store the user_id records for EU users (without their PII) to satisfy foreign key constraints for operational tables.
Ireland: This table remains the primary store for EU user PII.
Schema Change: Implementing an is_eu boolean flag in all the regions of the Users table. This flag is essential for the application layer to determine whether to route a record to Virginia or Ireland during user creation.
ALTER TABLE virginia_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT FALSE;
ALTER TABLE ireland_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT TRUE;
ALTER TABLE stocklohm_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT TRUE;
Tables such as request, action, volunteers_assigned, and req_comments are classified as Global Operational Tables.
Change: Because we are keeping user_id as the primary key and foreign key, these tables do not require a massive DDL refactor.
Workflow: To maintain referential integrity, when an EU user is created in Ireland, the user_id must be asynchronously (or via API) inserted into the Virginia users table. This ensures that when a global admin in Virginia looks at an action or request record, the database doesn't throw a foreign key violation error, even though the actual PII for that user resides securely in Ireland.
To prevent the "Sequence Generator Clash" (Issue 3 & 4), the sequence logic inside ddl_users.sql must be updated to support Sequence Range Partitioning
The platform runs across three regions. Each region has a non-overlapping sequence range to prevent ID collisions,
especially during Disaster Recovery of Virginia:
Virginia Sequence Range: 1 to 1,000,000,000,000$ (1 Trillion)
Ireland Sequence Range: 2,000,000,000,000 to 4,000,000,000,000 (2-4 Trillion)
Stocklohm Sequence Range: 4,000,000,000,000 and above (if stocklohm also creates user_id of virginia region)
Impact: If Virginia goes down, Ireland takes over registration of non-EU users using the 2 Trillion+ range. This prevents ID collisions when the regions eventually sync back up.
especially during Disaster Recovery of Virginia:
Ireland Sequence Range: 1 to 1,000,000,000,000$ (1 Trillion)
Stocklohm Sequence Range: 2,000,000,000,000 to 4,000,000,000,000 (2-4 Trillion)
The sequence generator code is in progress.
The user_id format is: STD-00-[seg1]-[seg2]-[seg3]-[seg4]-[seg5] (5 segments, 3 digits each)
The region identity is explicitly written into the function. The sequence value is stripped of its trillion base using % 1,000,000,000,000 before being split into the remaining 4 segments.
CREATE OR REPLACE FUNCTION generate_sid()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
region_prefix TEXT;
BEGIN
-- Hardcoded per region. Change only this line per instance.
region_prefix := '002'; 002 = Ireland | 003 = Stockholm
seq_id := nextval('seq_non_eu_user_id');
new_id := 'STD-00-' || region_prefix || '-' ||
LPAD(FLOOR((seq_id % 1000000000000) / 1000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000) / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
NEW.user_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
| Step | Operation | Value |
|---|---|---|
| region_prefix | hardcoded | 002 |
| seq_id % 1T | strips trillion base | 4,010,003 |
| % 1T / 1B | seg2 | 000 |
| % 1B / 1M | seg3 | 004 |
| % 1M / 1K | seg4 | 010 |
| % 1K | seg5 | 003 |
Output: STD-00-002-000-004-010-003 Why: You essentially tell the trigger: "Any non-EU user created here is an Ireland-DR user, so prefix with 002."
The region identity is derived from the sequence value itself. Since the sequence starts at an exact trillion multiple, dividing by 1T naturally produces the region segment. No prefix string is written anywhere in the function.
CREATE OR REPLACE FUNCTION generate_sid()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
BEGIN
seq_id := nextval('seq_non_eu_user_id');
new_id := 'STD-00-' ||
LPAD(FLOOR(seq_id / 1000000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000000) / 1000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000) / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
NEW.user_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
| Step | Operation | Value |
|---|---|---|
| seq_id / 1T | extracts region digit | 002 |
| seq_id % 1T / 1B | seg2 | 000 |
| % 1B / 1M | seg3 | 004 |
| % 1M / 1K | seg4 | 010 |
| % 1K | seg5 | 003 |
Output: STD-00-002-000-004-010-003
| Approach 1 (Hardcoded) | Approach 2 (Math Extract) | |
|---|---|---|
| Output for same seq_id | STD-00-002-000-004-010-003 | STD-00-002-000-004-010-003 |
| Region identity source | Explicit string in function | Derived from sequence start |
| Same function across regions? | No — one copy per region with different prefix | Yes — identical function everywhere |
| If sequence start drifts slightly | No impact on prefix, counter just looks odd | seg1 slightly off but still unique |
| If sequence overflows its trillion range | Silent ID collision — prefix lies about region, counter wraps | seg1 naturally increments to 003, no collision |
| Readability | Immediately clear which region | Must trace math to understand |
| Adding a new region | Edit one line per function | Just set correct sequence start |
| Risk level | Low | Medium — hidden contract on sequence start |
This is the original generate_sid() logic that was already in place, unchanged. It uses 1,000,000 as the base divisor and produces 3 segments.
If the same logic is used, the trillions part (2,000,000) is stripped away by the modulo/division math used to fit the number into the 3-digit segments.
new_id := 'STD-00-' ||
LPAD(FLOOR(seq_id / 1000000)::TEXT, 3, '0') || '-' || -- WITH NO TRILLION AWARENESS
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
During DR (Virginia is down, Ireland is registering non-EU users):
- Ireland's sequence starts at 2,000,000,000,000
- User created → seq_id = 2,000,000,004,010,003
- Version 3 generates → STD-00-004-010-003
Virginia comes back up, sync begins:
- Ireland DR record STD-00-004-010-003 is copied to Virginia
- Virginia already has a user STD-00-004-010-003 created from seq_id = 1,000,000,004,010,003
- Primary key violation. Sync fails.
The two sequences were designed to be non-overlapping, but the math throws away the information. The trillion-range partitioning becomes completely useless. This only works if the logic is for a Virginia-only operation.
Even at 100 million users — which would make Saayam one of the largest volunteer platforms in the world — we're well within the billion range.
The trillion base for user_id stays as is, not because we need that many users, but because the gap between regions needs to be large enough that it can never accidentally be crossed. Billions would actually work too, but a trillion is a clean, readable boundary that makes the partitioning obvious at a glance.
For user_id, we have the trillion range partitioning specifically because of DR copy-back collision risk. For requests, there's no replication issue — all requests live in all regions. So the question is purely:
How many requests can this platform realistically generate?
For this, we can go with the prefix approach. Simple counter per region, no trillion base needed. Each region's sequence starts at 1. No collision risk because the region prefix already differentiates them.
'REQ-' ||
LPAD(FLOOR(seq_id / 100000000)::TEXT, 2, '0') || '-' || -- 2 digits
LPAD(FLOOR((seq_id % 100000000) / 100000)::TEXT, 3, '0') || '-' || -- 3 digits
LPAD(FLOOR((seq_id % 100000) / 1000)::TEXT, 3, '0') || '-' || -- 3 digits
LPAD((seq_id % 1000)::TEXT, 4, '0') -- 4 digits
So the original format is:
REQ-XX-XXX-XXX-XXXX
The 01 is the Virginia region prefix (hardcoded). So the question is — how many X's do we need after it?
REQ-01-XXXXXXXXX - 9 count - 999,999,999 ~1 billion
REQ-01-XXXXXXXXXXXX - 12 count - 999,999,999,999 ~1 trillion, future proof
Given the platform scale, 9 digits is realistic, and 12 digits is future-proof.
So the format would be:
REQ-01-000-000-001 ← split into 3x3 for readability in Virginia region
CREATE OR REPLACE FUNCTION virginia_dev_saayam_rdbms.generate_request_id()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
region_prefix TEXT;
BEGIN
region_prefix := '01'; -- 01 = Virginia | 02 = Ireland | 03 = Stockholm
seq_id := nextval('virginia_dev_saayam_rdbms.request_id_seq');
new_id := 'REQ-' || region_prefix || '-' ||
LPAD(FLOOR(seq_id / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
NEW.req_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
NOTE: INT in PostgreSQL caps at 2,147,483,647 — so if the sequence ever crosses 2.1 billion, nextval would overflow the seq_id variable and throw an error, even though the column itself is VARCHAR and could hold any string.
If we change to BIGINT, the sequence variable can do
FLOOR(seq_id / 100000000) -- seg1, 2 digits, max 99
FLOOR((seq_id % 100000000) / 100000) -- seg2, 3 digits, max 999
FLOOR((seq_id % 100000) / 1000) -- seg3, 3 digits, max 99 (only 2 digits, 100000/1000 = 100, not 999)
(seq_id % 1000) -- seg4, 4 digits, max 999 (LPAD 4 but value only goes to 999)
So changing to BIGINT gives you up to 9.9 billion requests per region with the existing format. (Caps: 99 * 100,000,000 = 9,900,000,000 → ~9.9 billion)
CHANGE TRILLION TO MAX OF 10 BILLION BUT NEED TO CHECK THE REQUEST LIMITS