** Architectural Decision Record: User Identity & Multi‐Region Data Residency

The platform Saayam For All operates across three geographic regions:
Virginia, USA - the primary/base region
Ireland, EU - the EU-specific region
Stocklohm, EU - the second EU-specific region

The platform manages beneficiaries and volunteers, and helps with requests globally. The core schema challenge was how to build a unified global system while ensuring EU user data complied with GDPR and data residency laws.

What is Already There?

Existing Setup:

A user's table in Virginia with PII (name, email, phone, address, etc.)
A sequence generator generate_sid() that creates IDs in the format SID-00-XXX-XXX-XXX
Virginia handles Non-EU users; Ireland handles EU users (SID-EU-XXX-XXX-XXX)
All other tables (requests, actions, skills, volunteers_assigned, etc.) referenced user_id as a foreign key

The Problem:
The existing user_id is both a database primary key and a regional identity marker. There was no concept of a "global" identifier that could safely travel across regions without carrying PII or violating compliance rules.

The Core Issues Identified

Issue 1 — Data Residency & GDPR: EU user PII (name, email, address) cannot legally reside in a US-based database. If you store an EU user's record in Virginia, you're violating data residency rules. But if Virginia can't see EU user data, how does it process help requests for those users?
Issue 2 — Foreign Key Constraints: All operational tables (requests, actions, comments) use user_id as a foreign key. If an EU user's user_id lives only in Ireland, Virginia can't link a help request to that user without breaking referential integrity.
Issue 3 — The Sequence Generator Clash Virginia's sequence generates SID-00-... and Ireland's generates SID-EU-.... You cannot insert one region's ID format into the other region's tables without breaking the sequence format validation.
Issue 4 — Disaster Recovery (DR) If Virginia goes down, Ireland must temporarily register all users (including Non-EU ones). But Ireland's sequence generator doesn't know where Virginia stopped. When Virginia comes back, how do you sync without creating duplicate IDs?

Let's think of dealing with these core issues one by one.

Issue 1 — Data Residency & GDPR of EU users

Approach 1: Direct Unique ID
Just create a unique ID and insert it across all regions - no restriction on the ID, so that it can live anywhere.
But it doesn't address how the ID is formatted to satisfy the existing sequence generators, how the foreign key constraints on user_id are handled. Vague - doesn't specify what happens structurally to dozens of tables.

Approach 2: Composite Primary Key (global_id, user_id) in the Users table
Alter all operational tables to use a combination of two keys - the new global ID and the existing regional user_id. For EU users in Virginia, the user_id field would be filled with a static placeholder like EXTERNAL_EU. Keeps database constraints satisfied. Virginia can store EU user requests using the composite key.
But it requires altering every operational table in every region - massive DDL refactoring. The placeholder string EXTERNAL_EU is messy; it doesn't play well with typed sequence generators. Every query now requires awareness of whether the row is a "real" user or a "placeholder" user, and adds complexity to joins and debugging.

Approach 3: Shadow Profile Approach
When an EU user is created, Virginia's sequence generator still runs and creates a "shadow" user record (e.g., SID-00-001) — but this record contains zero PII. It only holds the user_id and a global_id column pointing to Ireland. All existing tables continue using user_id as normal. Zero changes to existing table structures. All foreign key relationships stay intact. The system functions exactly as it does today.
But if the shadow record gets out of sync with Ireland, data consistency breaks silently.

Approach 4: Complete Decoupling (Microservices Way)
Remove user_id as a foreign key from all global/operational tables entirely. Replace it with global_id. Keep user_id only in PII-heavy regional tables. Clean separation. Global tables do not know regional sequences. Virginia can accept records for any user in the world without needing their profile stored locally.
Most invasive refactor — foreign key constraints must be dropped from many tables. Moves validation from the database layer to the application layer, increasing application-side risk, and requires careful handling to avoid "orphan" records in operational tables.

Approach 5: The Global ID (GID) in a new table called global_users mapping table
A GID generated at the Volunteer Microservice level (not in the database). This is the universal token that represents a user across all regions. Contains zero PII. Is not tied to any sequence generator. Can safely live in Virginia, Ireland, or anywhere without compliance issues. Is generated before the database transaction begins, so it's available to insert into both users and the mapping table atomically.
Why not generate it in the database? If you use a DB trigger to generate the GID, you create a hidden dependency. If you ever change databases, migrate schemas, or debug a DR incident, you'd need to account for that trigger. Service-layer generation is transparent and database-agnostic.
But what about the situations of disaster recovery? How are we handling the sync process? There has to be a clear line of flow for the administrators and engineers, even before the implementation of the schema. There is no rethinking of maybe.

Issue 2 — Foreign Key Constraints

For every operational record, there is a user_id link to connect to the details of that particular user. There is no possible way to store the record if that user_id isn't present anywhere for the link in the instance. Either we have to create a table (users, in our case) that has all the user_id of all the instances, even if the other attributes aren't stored, or have a globally recognized identifier.

Issue 3 & 4 — The Sequence Generator Clash and the Disaster Recovery Process

The existing generate_sid() sequence logic is:
us-east-1: Virginia region generates SID-00-XXX-XXX-XXX
eu-west-1: Ireland region generates SID-EU-XXX-XXX-XXX

In a normal execution, the user_id would be created with their respective sequence generators in their respective instance. Then what happens in a disaster? How will the new traffic from Virginia to Ireland be handled? The answer to this is that the Ireland region always contains the Virginia user data, so the user creation to request handling of an existing Virginia user can be handled.

But what about the new Virginia users created in the Ireland region? One of the best approaches is having a dual sequence generator in the Ireland region. The existing generate_sid() sequence logic is kept intact but made smarter.

We can include an is_eu flag in the existing users table that is given by the volunteer microservice, which comes with the user details that need to be inserted.

In the generator of the Ireland region:
If is_eu = TRUE, it generates SID-EU-XXX-XXX-XXX
If is_eu = FALSE, it generates SID-00-XXX-XXX-XXX

This avoids creating the user record as an EU type, and when the Virginia region is up again, during the sync up within that timeline, all the newly created Virginia users in Ireland will be copied to the Virginia region, and if any new records of Ireland users are created, their id's need to be figured out! The UID remains the regional primary key for PII-heavy tables.

But there is an issue here: how would the database know what the last countdown for the Virginia user who was just created was when Virginia is down? If we don't know the last count, the sequence generator in Ireland would start from the beginning, creating duplicate records during sync-up.

To prevent ID collisions during a disaster (when Ireland must register non-EU users), each region's sequence can be pre-allocated to a non-overlapping range.
Virginia: sequence starts at 1–1,000,000,000,000 (1 trillion)
Ireland: sequence starts at 2,000,000,000,000-n (2 trillion)

This means,
Virginia users in Virginia: SID-00-001-XXX-XXX
Virginia users in Ireland (registered during DR): SID-00-002-XXX-XXX
They can never collide, no matter how long Virginia is offline

The generate_sid() function's math also needs to be updated to support trillion-scale ranges by increasing the base divisors from 1,000,000 to 1,000,000,000.

This clears the creation of Virginia users in the Ireland region during a disaster. After the issue, during the sync-up, the Virginia users are copied back to the Virginia region using the Sequence Range Partitioning concept.

Issue 1 — Data Residency & GDPR of EU users — Continued

Now coming back to Issue 1, if we entirely remove the user_id creation and have GID creation at the microservice level, all the regionally dependent tables would be left alone, need a different method to differentiate the users.
That's why we are considering adding only the user_id of the Ireland region in the Virginia region during disaster for our schema-level implementation, considering the schema ID is not PII data that comes into GDPR. This would simplify the process, eliminating issue 2, backed by the sequence generators, instead of going around on the GID, their creation and maintenance.

NOTE: The same concept can be applied to Ireland Disaster Recovery process. When the Ireland region is down, all the newly created users' records should be stored in the Stocklohm first. Instead of having a dual sequence generator, we can have an Ireland region sequence generator with the Sequence Range Partitioning concept.

Conclusion:

We are not going to remove the user_id as the primary key. Keep the user_id as our key point with an is_eu flag for support.

In the Virginia region, the user_id will have a sequence generator that creates the non-EU users. And all the Ireland region users' user_ids will be stored in the Virginia Users' table for the operational table to work. When the Virginia region is down, the new Virginia users are created in the Ireland region, which has a dual sequence generator and a different starting count of user_ids. When the region is up again, the details are copied back to the Virginia region. Any newly created Ireland users' user_ids will also be synced. When the Ireland region is down, the Stocklohm region created the EU users with a different starting count of EU IDs. When the region is up, all the EU user records created would be copied to the Ireland region, and their IDs are stored in the Users table of the Virginia region.

The is_eu check is really only meaningful during user creation routing, and that routing decision is made by the Volunteer Microservice — not the database.

Complete User creation workflow

Normal operation: non-EU 'A' user

The volunteer team gets the user data with the region confirmation.
Service checks is_eu = FALSE → goes for DB connection - Virginia
Opens a database transaction (BEGIN)
Inserts 'A' into the users table → trigger generates user_id and flag as FALSE.
Commits (COMMIT)
During the sync-up, this record is stored in the Ireland region and the Stocklohm region at their own Asynchronous Replication.

Normal operation: EU 'B' user

The volunteer team gets the user data with the region confirmation.
Service checks is_eu = TRUE → goes for DB connection - Ireland
Opens a database transaction (BEGIN)
Inserts 'A' into the users table → trigger generates user_id and flag as TRUE.
Commits (COMMIT)
Insert the user_id created in Ireland into the Virginia users table using API/ another transaction, maintaining the CAP theorem.
During the asynchronous replication, this record is stored in the Stocklohm region.

Virginia goes offline: non-EU 'C' user

The volunteer team gets the user data with the region confirmation.
Service checks is_eu = FALSE → goes for DB connection - Ireland as Virginia is unreachable
Opens a database transaction (BEGIN)
Inserts 'A' into the users table with Ireland's trigger generating SID-00-002-XXX (from its 2-trillion range)
Commits (COMMIT)
During the asynchronous replication, this record is stored in the Stocklohm region.
When the Virginia region is up, sync back the record to the Virginia region.

Table Classification

Based on the full DDL list, every table was classified into three categories:
Static Reference tables (no user link needed): country, state, city, help_categories, request_status, request_priority, sla, notification_types, identity_type, and others — these are global lookup tables with no user identity.

PII related (stay regional, contain PII or sensitive operational data): user_locations, volunteer_locations, user_availability, volunteer_applications, volunteer_details, user_additional_details, user_signoff, volunteer_rating, user_notification_preferences, user_notification_status, notifications

Operational tables (global operational tables, cross-region safe): request, action, volunteers_assigned, user_skills, org_skills, help_categories_map, user_category, fraud_requests, req_add_info, req_comments, request_for, request_isleadvol, user_org_map, organizations, volunteer_organizations

Things to be noted

The users Table (Regional PII Management)

This is the most critical change.
Virginia: This table will now serve as a "Global Registry" of sorts. While it only stores PII for non-EU users, it will now also store the user_id records for EU users (without their PII) to satisfy foreign key constraints for operational tables.
Ireland: This table remains the primary store for EU user PII.
Schema Change: Implementing an is_eu boolean flag in all the regions of the Users table. This flag is essential for the application layer to determine whether to route a record to Virginia or Ireland during user creation.

ALTER TABLE virginia_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT FALSE;

ALTER TABLE ireland_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT TRUE;

ALTER TABLE stocklohm_dev_saayam_rdbms.users
ADD COLUMN is_eu BOOLEAN DEFAULT TRUE;

Operational Tables (Foreign Key Integrity)

Tables such as request, action, volunteers_assigned, and req_comments are classified as Global Operational Tables.
Change: Because we are keeping user_id as the primary key and foreign key, these tables do not require a massive DDL refactor.
Workflow: To maintain referential integrity, when an EU user is created in Ireland, the user_id must be asynchronously (or via API) inserted into the Virginia users table. This ensures that when a global admin in Virginia looks at an action or request record, the database doesn't throw a foreign key violation error, even though the actual PII for that user resides securely in Ireland.

Disaster Recovery (DR) and Sequence Ranges:

To prevent the "Sequence Generator Clash" (Issue 3 & 4), the sequence logic inside ddl_users.sql must be updated to support Sequence Range Partitioning
The platform runs across three regions. Each region has a non-overlapping sequence range to prevent ID collisions,
especially during Disaster Recovery of Virginia:
Virginia Sequence Range: 1 to 1,000,000,000,000$ (1 Trillion)
Ireland Sequence Range: 2,000,000,000,000 to 4,000,000,000,000 (2-4 Trillion)
Stocklohm Sequence Range: 4,000,000,000,000 and above (if stocklohm also creates user_id of virginia region)
Impact: If Virginia goes down, Ireland takes over registration of non-EU users using the 2 Trillion+ range. This prevents ID collisions when the regions eventually sync back up.

especially during Disaster Recovery of Virginia:
Ireland Sequence Range: 1 to 1,000,000,000,000$ (1 Trillion)
Stocklohm Sequence Range: 2,000,000,000,000 to 4,000,000,000,000 (2-4 Trillion)

The sequence generator code is in progress.

Version 1: Hardcoded Prefix (SID-00-002-...)

The user_id format is: STD-00-[seg1]-[seg2]-[seg3]-[seg4]-[seg5] (5 segments, 3 digits each)
The region identity is explicitly written into the function. The sequence value is stripped of its trillion base using % 1,000,000,000,000 before being split into the remaining 4 segments.

CREATE OR REPLACE FUNCTION generate_sid()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
region_prefix TEXT;
BEGIN
-- Hardcoded per region. Change only this line per instance.
region_prefix := '002'; 002 = Ireland | 003 = Stockholm

seq_id := nextval('seq_non_eu_user_id');

new_id := 'STD-00-' || region_prefix || '-' ||
LPAD(FLOOR((seq_id % 1000000000000) / 1000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000) / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');

NEW.user_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Step	Operation	Value
region_prefix	hardcoded	002
seq_id % 1T	strips trillion base	4,010,003
% 1T / 1B	seg2	000
% 1B / 1M	seg3	004
% 1M / 1K	seg4	010
% 1K	seg5	003

Output: STD-00-002-000-004-010-003 Why: You essentially tell the trigger: "Any non-EU user created here is an Ireland-DR user, so prefix with 002."

Version 2: Keep the Trillion

The region identity is derived from the sequence value itself. Since the sequence starts at an exact trillion multiple, dividing by 1T naturally produces the region segment. No prefix string is written anywhere in the function.

CREATE OR REPLACE FUNCTION generate_sid()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
BEGIN
seq_id := nextval('seq_non_eu_user_id');

new_id := 'STD-00-' ||
LPAD(FLOOR(seq_id / 1000000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000000) / 1000000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000000) / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');

NEW.user_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Step	Operation	Value
seq_id / 1T	extracts region digit	002
seq_id % 1T / 1B	seg2	000
% 1B / 1M	seg3	004
% 1M / 1K	seg4	010
% 1K	seg5	003

Output: STD-00-002-000-004-010-003

	Approach 1 (Hardcoded)	Approach 2 (Math Extract)
Output for same seq_id	STD-00-002-000-004-010-003	STD-00-002-000-004-010-003
Region identity source	Explicit string in function	Derived from sequence start
Same function across regions?	No — one copy per region with different prefix	Yes — identical function everywhere
If sequence start drifts slightly	No impact on prefix, counter just looks odd	seg1 slightly off but still unique
If sequence overflows its trillion range	Silent ID collision — prefix lies about region, counter wraps	seg1 naturally increments to 003, no collision
Readability	Immediately clear which region	Must trace math to understand
Adding a new region	Edit one line per function	Just set correct sequence start
Risk level	Low	Medium — hidden contract on sequence start

What actually happens?

This is the original generate_sid() logic that was already in place, unchanged. It uses 1,000,000 as the base divisor and produces 3 segments.
If the same logic is used, the trillions part (2,000,000) is stripped away by the modulo/division math used to fit the number into the 3-digit segments.

new_id := 'STD-00-' ||
LPAD(FLOOR(seq_id / 1000000)::TEXT, 3, '0') || '-' || -- WITH NO TRILLION AWARENESS
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');

During DR (Virginia is down, Ireland is registering non-EU users):

Ireland's sequence starts at 2,000,000,000,000
User created → seq_id = 2,000,000,004,010,003
Version 3 generates → STD-00-004-010-003

Virginia comes back up, sync begins:

Ireland DR record STD-00-004-010-003 is copied to Virginia
Virginia already has a user STD-00-004-010-003 created from seq_id = 1,000,000,004,010,003
Primary key violation. Sync fails.

The two sequences were designed to be non-overlapping, but the math throws away the information. The trillion-range partitioning becomes completely useless. This only works if the logic is for a Virginia-only operation.

Why Trillion?

Even at 100 million users — which would make Saayam one of the largest volunteer platforms in the world — we're well within the billion range.
The trillion base for user_id stays as is, not because we need that many users, but because the gap between regions needs to be large enough that it can never accidentally be crossed. Billions would actually work too, but a trillion is a clean, readable boundary that makes the partitioning obvious at a glance.

Sequence Generators in Request and Organization tables

For user_id, we have the trillion range partitioning specifically because of DR copy-back collision risk. For requests, there's no replication issue — all requests live in all regions. So the question is purely:
How many requests can this platform realistically generate?

For this, we can go with the prefix approach. Simple counter per region, no trillion base needed. Each region's sequence starts at 1. No collision risk because the region prefix already differentiates them.

What is actually happening

'REQ-' ||
LPAD(FLOOR(seq_id / 100000000)::TEXT, 2, '0') || '-' || -- 2 digits
LPAD(FLOOR((seq_id % 100000000) / 100000)::TEXT, 3, '0') || '-' || -- 3 digits
LPAD(FLOOR((seq_id % 100000) / 1000)::TEXT, 3, '0') || '-' || -- 3 digits
LPAD((seq_id % 1000)::TEXT, 4, '0') -- 4 digits

So the original format is:
REQ-XX-XXX-XXX-XXXX

For REQ-01-XXXXX Prefix

The 01 is the Virginia region prefix (hardcoded). So the question is — how many X's do we need after it?
REQ-01-XXXXXXXXX - 9 count - 999,999,999 ~1 billion
REQ-01-XXXXXXXXXXXX - 12 count - 999,999,999,999 ~1 trillion, future proof
Given the platform scale, 9 digits is realistic, and 12 digits is future-proof.

So the format would be:
REQ-01-000-000-001 ← split into 3x3 for readability in Virginia region

CREATE OR REPLACE FUNCTION virginia_dev_saayam_rdbms.generate_request_id()
RETURNS TRIGGER AS $$
DECLARE
seq_id BIGINT;
new_id TEXT;
region_prefix TEXT;
BEGIN
region_prefix := '01'; -- 01 = Virginia | 02 = Ireland | 03 = Stockholm
seq_id := nextval('virginia_dev_saayam_rdbms.request_id_seq');
new_id := 'REQ-' || region_prefix || '-' ||
LPAD(FLOOR(seq_id / 1000000)::TEXT, 3, '0') || '-' ||
LPAD(FLOOR((seq_id % 1000000) / 1000)::TEXT, 3, '0') || '-' ||
LPAD((seq_id % 1000)::TEXT, 3, '0');
NEW.req_id := new_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

NOTE: INT in PostgreSQL caps at 2,147,483,647 — so if the sequence ever crosses 2.1 billion, nextval would overflow the seq_id variable and throw an error, even though the column itself is VARCHAR and could hold any string.

If we change to BIGINT, the sequence variable can do
FLOOR(seq_id / 100000000) -- seg1, 2 digits, max 99
FLOOR((seq_id % 100000000) / 100000) -- seg2, 3 digits, max 999
FLOOR((seq_id % 100000) / 1000) -- seg3, 3 digits, max 99 (only 2 digits, 100000/1000 = 100, not 999)
(seq_id % 1000) -- seg4, 4 digits, max 999 (LPAD 4 but value only goes to 999)
So changing to BIGINT gives you up to 9.9 billion requests per region with the existing format. (Caps: 99 * 100,000,000 = 9,900,000,000 → ~9.9 billion)

CHANGE TRILLION TO MAX OF 10 BILLION BUT NEED TO CHECK THE REQUEST LIMITS

Uh oh!

** Architectural Decision Record: User Identity & Multi‐Region Data Residency

What is Already There?

The Core Issues Identified

Issue 1 — Data Residency & GDPR of EU users

Issue 2 — Foreign Key Constraints

Issue 3 & 4 — The Sequence Generator Clash and the Disaster Recovery Process

Issue 1 — Data Residency & GDPR of EU users — Continued

Conclusion:

Complete User creation workflow

Table Classification

Things to be noted

The users Table (Regional PII Management)

Operational Tables (Foreign Key Integrity)

Disaster Recovery (DR) and Sequence Ranges:

Version 1: Hardcoded Prefix (SID-00-002-...)

Version 2: Keep the Trillion

What actually happens?

Why Trillion?

Sequence Generators in Request and Organization tables

What is actually happening

For REQ-01-XXXXX Prefix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally