|
| 1 | +--- |
| 2 | +name: adr-016-tenant-id-naming-strategy |
| 3 | +description: Keep user-defined tenant IDs for schema naming; document namespace limitations as acceptable trade-off |
| 4 | +triggers: |
| 5 | + - Designing tenant provisioning workflows |
| 6 | + - Evaluating namespace exhaustion concerns |
| 7 | + - Considering UUID vs human-readable identifiers for multi-tenancy |
| 8 | + - Deprovisioning tenants and ID reuse questions |
| 9 | +instructions: | |
| 10 | + Use user-defined alphanumeric tenant IDs (pattern: ^[a-zA-Z0-9_]{1,50}$) that map directly |
| 11 | + to PostgreSQL schema names (org_{tenant_id}). Accept namespace exhaustion as a documented |
| 12 | + limitation acceptable for infrastructure multi-tenancy. Revisit if scale exceeds 1,000 |
| 13 | + tenants or commercial SaaS model is adopted. |
| 14 | +--- |
| 15 | + |
| 16 | +# 16. Tenant ID Naming Strategy |
| 17 | + |
| 18 | +Date: 2025-12-13 |
| 19 | + |
| 20 | +## Status |
| 21 | + |
| 22 | +Accepted |
| 23 | + |
| 24 | +## Context |
| 25 | + |
| 26 | +The Tenant Service uses **user-defined alphanumeric IDs** (e.g., `acme_bank`) as both the |
| 27 | +public API identifier and the PostgreSQL schema name (`org_acme_bank`). This creates a |
| 28 | +potential **namespace exhaustion problem**: deprovisioned tenant IDs cannot be reused |
| 29 | +because the schema namespace is finite and human-readable IDs are desirable for new tenants. |
| 30 | + |
| 31 | +### Current Implementation |
| 32 | + |
| 33 | +| Aspect | Implementation | |
| 34 | +|--------|---------------| |
| 35 | +| **ID Validation** | `^[a-zA-Z0-9_]{1,50}$` | |
| 36 | +| **Schema Naming** | `org_{lowercase(tenant_id)}` | |
| 37 | +| **Public Exposure** | API paths, JWT claims (`x-tenant-id`), subdomains | |
| 38 | +| **Status Lifecycle** | ACTIVE → SUSPENDED → DEPROVISIONED (terminal) | |
| 39 | +| **ID Reuse** | Not supported (deprovisioned IDs are consumed forever) | |
| 40 | + |
| 41 | +### The Problem |
| 42 | + |
| 43 | +Once a tenant is deprovisioned, its ID (e.g., `acme_bank`) is permanently consumed: |
| 44 | + |
| 45 | +1. A new organization cannot claim this desirable name |
| 46 | +2. Namespace pollution accumulates over time |
| 47 | +3. Schema names remain visible in logs, connection strings, and error messages |
| 48 | + |
| 49 | +### Relevant Context |
| 50 | + |
| 51 | +- **Deployment Model**: Meridian is infrastructure, not commercial SaaS. Organizations |
| 52 | + own and operate their own instances with data sovereignty requirements. |
| 53 | +- **Primary Use Case**: Demonstration infrastructure where multiple tenants share a |
| 54 | + cluster for cost efficiency (Post Office, Motive, UN WFP scenarios). |
| 55 | +- **Expected Scale**: Tens to low hundreds of tenants per deployment, not thousands. |
| 56 | +- **Debuggability Priority**: Operators rely heavily on human-readable schema names for |
| 57 | + troubleshooting (`org_post_office` vs `org_550e8400_e29b_41d4`). |
| 58 | + |
| 59 | +## Decision Drivers |
| 60 | + |
| 61 | +* **Operational debuggability**: Schema names visible in logs, query plans, error messages |
| 62 | +* **Implementation simplicity**: Avoid migration complexity for current deployments |
| 63 | +* **Namespace sustainability**: Long-term ID pool viability |
| 64 | +* **Privacy**: Tenant identity exposure in technical artifacts |
| 65 | +* **API consistency**: Alignment with existing Party Service patterns |
| 66 | +* **Migration cost**: Effort to rename schemas, update JWT claims, modify routing |
| 67 | + |
| 68 | +## Considered Options |
| 69 | + |
| 70 | +### Option 1: Keep Current User-Defined Approach |
| 71 | + |
| 72 | +Maintain status quo with documented limitations. |
| 73 | + |
| 74 | +**Implementation**: No changes to existing codebase. |
| 75 | + |
| 76 | +### Option 2: System-Generated UUIDs for Schema Naming |
| 77 | + |
| 78 | +Use UUIDv7 (time-ordered) internally for schemas while keeping `display_name` for human |
| 79 | +readability. |
| 80 | + |
| 81 | +**Implementation Sketch**: |
| 82 | + |
| 83 | +```protobuf |
| 84 | +message Tenant { |
| 85 | + string tenant_id = 1; // Internal UUID: "550e8400-e29b-41d4-a716-446655440000" |
| 86 | + string display_name = 2; // Human-friendly: "Acme Bank" |
| 87 | + string slug = 3; // URL-safe: "acme-bank" (optional for subdomains) |
| 88 | +} |
| 89 | +``` |
| 90 | + |
| 91 | +**Schema naming**: `org_550e8400_e29b_41d4` (normalized UUID prefix) |
| 92 | + |
| 93 | +### Option 3: Hybrid Approach (Internal UUID + External Slug) |
| 94 | + |
| 95 | +Separate internal identifier (UUID for schemas) from external identifier (slug for API/JWT). |
| 96 | + |
| 97 | +**Implementation Sketch**: |
| 98 | + |
| 99 | +```protobuf |
| 100 | +message Tenant { |
| 101 | + string internal_id = 1; // Internal UUID (not exposed in API responses) |
| 102 | + string slug = 2; // External identifier for API, JWT (e.g., "acme_bank") |
| 103 | + string display_name = 3; // Human-readable name |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +**Schema naming**: `org_{uuid}` (internal) |
| 108 | +**API surface**: `/v1/tenants/{slug}`, JWT claim `x-tenant-id=acme_bank` |
| 109 | + |
| 110 | +## Decision Outcome |
| 111 | + |
| 112 | +Chosen option: **Option 1 (Keep Current User-Defined Approach)**, because the namespace |
| 113 | +exhaustion problem is theoretical for the current deployment model and expected scale, |
| 114 | +while the debugging and operational benefits of human-readable schema names are immediate |
| 115 | +and significant. |
| 116 | + |
| 117 | +### Rationale |
| 118 | + |
| 119 | +1. **Scale Reality**: Demonstration infrastructure with tens of tenants will not exhaust |
| 120 | + the namespace for years. At 100 tenants/year with 10% churn, reaching 10,000 consumed |
| 121 | + IDs takes 100+ years. |
| 122 | + |
| 123 | +2. **Debugging Value**: Schema names like `org_post_office` in query plans, connection |
| 124 | + strings, and error logs provide immediate context. UUID-based names require constant |
| 125 | + lookup to correlate with tenant identity. |
| 126 | + |
| 127 | +3. **Migration Cost**: Options 2 and 3 require: |
| 128 | + - Schema renames for existing tenants |
| 129 | + - JWT claim format changes |
| 130 | + - Middleware updates for slug → UUID resolution |
| 131 | + - API breaking changes or dual-identifier periods |
| 132 | + - Test suite updates across all services |
| 133 | + |
| 134 | +4. **SaaS Model Not Current**: Meridian is infrastructure for organizations to operate, |
| 135 | + not a commercial SaaS platform. Multi-tenancy is for demonstration, not production |
| 136 | + customer isolation with billing and churn. |
| 137 | + |
| 138 | +5. **Industry Precedent**: Stripe uses prefixed human-readable IDs (`cus_`, `pi_`) rather |
| 139 | + than pure UUIDs because debuggability outweighs namespace concerns at their scale. |
| 140 | + Auth0 recommends UUIDs for portability but acknowledges the debugging trade-off. |
| 141 | + |
| 142 | +### Documented Limitations |
| 143 | + |
| 144 | +The following limitations are explicitly accepted: |
| 145 | + |
| 146 | +| Limitation | Mitigation | |
| 147 | +|------------|------------| |
| 148 | +| **Namespace exhaustion** | Monitor deprovisioned count; revisit if approaching 5,000 | |
| 149 | +| **ID reuse impossible** | Document that names are consumed permanently | |
| 150 | +| **Schema name privacy** | Accepted for infrastructure (not end-user-facing SaaS) | |
| 151 | +| **Tenant renames** | Not supported (display_name can change, ID cannot) | |
| 152 | + |
| 153 | +### Reconsidering This Decision |
| 154 | + |
| 155 | +Revisit Option 2 or 3 if: |
| 156 | + |
| 157 | +- Tenant count exceeds 1,000 active tenants per deployment |
| 158 | +- Commercial SaaS model is adopted with high customer churn |
| 159 | +- Privacy requirements emerge (GDPR concern about schema name exposure) |
| 160 | +- Cross-deployment tenant portability becomes a requirement |
| 161 | + |
| 162 | +## Pros and Cons of the Options |
| 163 | + |
| 164 | +### Option 1: Keep Current User-Defined Approach |
| 165 | + |
| 166 | +**Description**: Maintain existing `^[a-zA-Z0-9_]{1,50}$` tenant IDs that map directly |
| 167 | +to PostgreSQL schema names (`org_{tenant_id}`). |
| 168 | + |
| 169 | +* Good, because zero implementation effort required |
| 170 | +* Good, because schema names are immediately debuggable (`org_post_office` is self-explanatory) |
| 171 | +* Good, because consistent with existing JWT claims, API paths, subdomain routing |
| 172 | +* Good, because aligns with Party Service pattern (party_id is also user-facing) |
| 173 | +* Bad, because deprovisioned IDs cannot be reused (namespace exhaustion) |
| 174 | +* Bad, because schema names are visible in logs/errors (privacy trade-off) |
| 175 | +* Bad, because tenant renames require schema rename (complex, risky) |
| 176 | + |
| 177 | +### Option 2: System-Generated UUIDs for Schema Naming |
| 178 | + |
| 179 | +**Description**: Generate UUIDv7 internally for schema isolation while keeping |
| 180 | +`display_name` for human readability. |
| 181 | + |
| 182 | +* Good, because unlimited namespace (UUIDs never collide) |
| 183 | +* Good, because privacy improved (schema names are opaque) |
| 184 | +* Good, because enables future ID recycling (deprovisioned schemas can be dropped) |
| 185 | +* Bad, because breaking change requiring migration of existing tenants |
| 186 | +* Bad, because debugging complexity (correlating `org_550e8400` to "Post Office" requires lookup) |
| 187 | +* Bad, because JWT claims lose human-readability |
| 188 | +* Bad, because inconsistency with Party Service (party_id is user-facing, not UUID-based) |
| 189 | + |
| 190 | +### Option 3: Hybrid Approach (Internal UUID + External Slug) |
| 191 | + |
| 192 | +**Description**: Separate internal identifier (UUID for schemas) from external identifier |
| 193 | +(slug for API/JWT). |
| 194 | + |
| 195 | +* Good, because best of both worlds (slugs for APIs, UUIDs for isolation) |
| 196 | +* Good, because namespace reuse possible (slugs reclaimed after grace period) |
| 197 | +* Good, because API backward compatibility (slugs remain stable) |
| 198 | +* Good, because privacy improved (internal schema names opaque) |
| 199 | +* Bad, because highest complexity (dual-identifier system requires careful indexing) |
| 200 | +* Bad, because slug conflicts possible (must enforce uniqueness + grace periods) |
| 201 | +* Bad, because migration challenge (existing tenant_id serves both roles) |
| 202 | +* Bad, because schema routing overhead (middleware must resolve slug → UUID) |
| 203 | +* Bad, because most implementation effort and risk |
| 204 | + |
| 205 | +## Industry Research |
| 206 | + |
| 207 | +### Stripe's Approach |
| 208 | + |
| 209 | +Stripe uses **prefixed human-readable IDs** (e.g., `cus_xyz123`, `pi_abc456`): |
| 210 | + |
| 211 | +- Type prefix makes IDs self-documenting for debugging |
| 212 | +- Random suffix provides uniqueness without full UUID length |
| 213 | +- Stripe stores the full prefixed ID as primary key (not separated) |
| 214 | + |
| 215 | +This pattern prioritizes debuggability over namespace concerns, even at Stripe's scale. |
| 216 | + |
| 217 | +### Auth0's Approach |
| 218 | + |
| 219 | +Auth0 recommends **UUIDs for portability**: |
| 220 | + |
| 221 | +- If tenants migrate between Auth0 accounts, UUID-based associations don't break |
| 222 | +- User IDs are affected by IdP configuration, so separate UUIDs are more stable |
| 223 | + |
| 224 | +However, Auth0 acknowledges this adds debugging complexity. |
| 225 | + |
| 226 | +### AWS Multi-Tenant Guidance |
| 227 | + |
| 228 | +AWS emphasizes **tenant isolation over ID strategy**: |
| 229 | + |
| 230 | +- Focus on access control and policy enforcement |
| 231 | +- ID format is secondary to isolation boundaries |
| 232 | +- Recommends identity providers (Cognito, Auth0) for tenant management |
| 233 | + |
| 234 | +### PostgreSQL Considerations |
| 235 | + |
| 236 | +- **Schema name limit**: 63 bytes (NAMEDATALEN - 1) |
| 237 | +- **Performance**: No significant difference between short names and UUID-based names |
| 238 | +- **UUIDv7**: PostgreSQL 18 introduces native support with 33% better performance than v4 |
| 239 | +- **Identifier case**: PostgreSQL folds unquoted identifiers to lowercase |
| 240 | + |
| 241 | +## Implementation Notes |
| 242 | + |
| 243 | +### If Option 2 or 3 Were Chosen (Future Reference) |
| 244 | + |
| 245 | +**Migration Steps** (for future reference if revisiting this decision): |
| 246 | + |
| 247 | +1. Add new `internal_id` (UUID) column to `platform.tenants` table |
| 248 | +2. Populate with UUIDv7 for existing tenants |
| 249 | +3. Create new schemas with UUID-based names (`org_{uuid_prefix}`) |
| 250 | +4. Migrate data from old schemas to new schemas |
| 251 | +5. Update middleware to resolve slug → UUID |
| 252 | +6. Update JWT claim format (or add dual-claim period) |
| 253 | +7. Deprecate old schema names after grace period |
| 254 | +8. Update test suites across all services |
| 255 | + |
| 256 | +**Estimated Effort**: 3-5 story points per service, plus 8-13 points for platform changes. |
| 257 | +Total: 30-50 story points with significant risk. |
| 258 | + |
| 259 | +### Monitoring Recommendations |
| 260 | + |
| 261 | +Track the following to detect when reconsideration is needed: |
| 262 | + |
| 263 | +```sql |
| 264 | +-- Namespace consumption query |
| 265 | +SELECT |
| 266 | + COUNT(*) FILTER (WHERE status = 'active') AS active_tenants, |
| 267 | + COUNT(*) FILTER (WHERE status = 'deprovisioned') AS consumed_ids, |
| 268 | + COUNT(*) AS total_consumed_namespace |
| 269 | +FROM platform.tenants; |
| 270 | +``` |
| 271 | + |
| 272 | +Alert if `consumed_ids` exceeds 1,000 or `consumed_ids / active_tenants` exceeds 5:1. |
| 273 | + |
| 274 | +## Links |
| 275 | + |
| 276 | +* [Stripe Object IDs Design](https://dev.to/4thzoa/designing-apis-for-humans-object-ids-3o5a) - Prefixed ID best practices |
| 277 | +* [Auth0 Multi-Tenant Best Practices](https://auth0.com/docs/get-started/auth0-overview/create-tenants/multi-tenant-apps-best-practices) |
| 278 | +* [AWS Multi-Tenant Authorization](https://docs.aws.amazon.com/prescriptive-guidance/latest/saas-multitenant-api-access-authorization/introduction.html) |
| 279 | +* [PostgreSQL UUID Documentation](https://www.postgresql.org/docs/current/datatype-uuid.html) |
| 280 | +* GitHub Issue: Multi-tenancy namespace strategy evaluation (Task 51) |
| 281 | + |
| 282 | +## Notes |
| 283 | + |
| 284 | +This ADR explicitly documents the trade-off between namespace sustainability and |
| 285 | +operational debuggability. The decision favors the latter based on: |
| 286 | + |
| 287 | +1. Current deployment model (infrastructure, not SaaS) |
| 288 | +2. Expected scale (tens of tenants, not thousands) |
| 289 | +3. Operational priority (debugging ease over theoretical namespace concerns) |
| 290 | +4. Migration cost (high effort for uncertain benefit) |
| 291 | + |
| 292 | +The decision should be reconsidered if the deployment model shifts toward commercial |
| 293 | +SaaS with high tenant churn, or if regulatory requirements emerge around tenant |
| 294 | +identifier privacy. |
0 commit comments