Skip to content

Commit 8394fce

Browse files
jhfclaude
andcommitted
doc: Add power group design document and naming rationale
Document current implementation: primary_influencer_only, constraint model, hierarchy algorithm, BRREG roles, import flow. Add naming rationale: "Power Group" replaces "Enterprise Group" because the EU term confuses Enterprise (economic grouping) with Enterprise Group (political control hierarchy). Control defines the tree, ownership enriches the graph. Document future directions: multi-root PGs, set-import, activity aggregation, public/private influence analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 982933c commit 8394fce

File tree

7 files changed

+340
-61
lines changed

7 files changed

+340
-61
lines changed

doc/DOMAIN-MODELS.md

Lines changed: 61 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,68 @@
11
# STATBUS Domain Models
22

3-
The Formal Economy is modeled as
4-
<img src="./diagrams/domains-formal.svg" alt="Domain Models Formal Economy" style="max-width:100%; max-height:300px;">
3+
## Overview
4+
5+
STATBUS organizes data into distinct domains based on what we **observe** versus what we **derive/label**:
6+
7+
| Domain | Nature | Temporal? | Example |
8+
|--------|--------|-----------|---------|
9+
| **Legal** | Observed from registries, contracts | Yes | Legal Unit, Legal Relationship |
10+
| **Physical** | Observed in the real world | Yes | Establishment |
11+
| **Statistical** | Derived artifact for statistics | No | Enterprise |
12+
| **Political** | Derived from ownership structures | No | Power Group |
13+
14+
### Key Insight
15+
16+
**Temporal entities** (Legal Unit, Establishment, Legal Relationship) represent things we observe over time - they have `valid_from`, `valid_to`, and history.
17+
18+
**Non-temporal entities** (Enterprise, Power Group) are statistical/analytical artifacts that **emerge from** temporal observations. They don't have independent temporal existence - their "state at time T" is derived from the temporal entities that reference them.
19+
20+
## Formal Economy
21+
22+
<img src="./diagrams/domains-formal.svg" alt="Domain Models Formal Economy" style="max-width:100%; max-height:400px;">
23+
24+
### Relationships
25+
26+
- **Legal Unit → Enterprise**: Every LU belongs to exactly one EN (always)
27+
- **Legal Relationship → Power Group**: Each controlling relationship (≥50%) belongs to a PG
28+
- **Legal Relationship → Legal Unit**: Tracks ownership/control between LUs (`influencing_id` owns/controls `influenced_id`)
29+
- **Establishment → Legal Unit**: Every ES belongs to exactly one LU
30+
31+
Note: Power group membership of a Legal Unit is derived through `legal_relationship` - an LU is in a power group if it participates in any relationship (as influencer or influenced) that has a `power_group_id`.
32+
33+
### Domain Details
34+
35+
**Legal Domain** (observed, temporal):
36+
- `legal_unit` - A legally registered entity (company, organization)
37+
- `legal_relationship` - Ownership/control relationship between two legal units
38+
- `influencing_id``influenced_id` (influencer owns/controls influenced)
39+
- `percentage` - ownership/control percentage (≥50% = controlling interest)
40+
- `power_group_id` - assigned by worker when relationship forms a controlling cluster
41+
42+
**Physical Domain** (observed, temporal):
43+
- `establishment` - A physical location where economic activity occurs
44+
45+
**Statistical Domain** (derived, non-temporal):
46+
- `enterprise` - The smallest combination of legal units that is an organizational unit producing goods or services
47+
48+
**Political Domain** (derived, non-temporal):
49+
- `power_group` - A hierarchy of legal units connected by controlling ownership (≥50%)
50+
- Power groups are **TIMELESS** - once created, they exist forever as a registry entry
51+
- Active status is **derived** at query time from `legal_relationship.valid_range`
52+
- The `power_group_id` lives on `legal_relationship`, not `legal_unit`
53+
54+
## Informal Economy
555

6-
The Informal Economy is modeled as
756
<img src="./diagrams/domains-informal.svg" alt="Domain Models Informal Economy" style="max-width:100%; max-height:300px;">
857

958
In the informal economy the establishment is directly assigned to an enterprise, because there
1059
is lacking stable identifiers, such as a tax registration.
60+
61+
## Why This Matters
62+
63+
The domain model clarifies:
64+
65+
1. **What we observe**: Legal registrations, physical locations, ownership records (temporal)
66+
2. **What we derive**: Statistical groupings, power hierarchies (non-temporal)
67+
3. **Data integrity**: Temporal entities need history tracking; derived entities need consistency with their sources
68+
4. **Query patterns**: Historical queries traverse temporal entities; current-state queries can use derived entities directly

doc/data-architecture.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# STATBUS Data Architecture
2+
3+
## Three Layers
4+
5+
1. **Import** — writes data to base tables (`legal_unit`, `establishment`, `legal_relationship`, `power_group`, `enterprise`, etc.)
6+
7+
2. **statistical_unit** — derived high-speed view of base table data, convenient for queries and aggregations. Built from `timepoints → timesegments → timeline_* → statistical_unit`.
8+
9+
3. **Reports** — high-speed reporting derived from `statistical_unit`: `statistical_*_history`, `statistical_*_facet`, etc.
10+
11+
## Key Invariant
12+
13+
All derived tables (layers 2 and 3) can be **cleared and recreated** from base tables at any time. The derive pipeline is idempotent — running it produces the same result regardless of prior state.
14+
15+
## Consequence for Pipeline Design
16+
17+
- Only **import** modifies base tables.
18+
- `derive_statistical_unit` **reads** base tables and writes derived tables.
19+
- `derive_reports` **reads** `statistical_unit` and writes report tables.
20+
21+
Any operation that creates or modifies base table records (e.g., creating `power_group` records, setting `legal_relationship.power_group_id`) belongs in the **import** layer, not in the analytics pipeline.

doc/data-model.md

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,37 @@ This document provides a compact overview of the StatBus database schema, focusi
88
## Core Statistical Units (Hierarchy)
99
The system revolves around four main statistical units, often with temporal validity (`valid_from`, `valid_after`, `valid_to`):
1010

11-
- `enterprise_group(id, short_name, name, enterprise_group_type_id, reorg_type_id, edit_by_user_id, unit_size_id, data_source_id, foreign_participation_id, valid_range, valid_from, valid_to, valid_until, edit_at, contact_person, edit_comment, reorg_references, reorg_date)` (EG) (temporal)
12-
- Key FKs: data_source_id, edit_by_user_id, enterprise_group_type_id, foreign_participation_id, reorg_type_id, unit_size_id.
13-
- `enterprise(id, short_name, edit_by_user_id, edit_at, enabled, edit_comment)` (EN)
14-
- Key FKs: edit_by_user_id.
15-
- `legal_unit(id, short_name, name, sector_id, status_id, legal_form_id, edit_by_user_id, unit_size_id, foreign_participation_id, data_source_id, enterprise_id, image_id, valid_range, valid_from, valid_to, valid_until, edit_at, birth_date, death_date, free_econ_zone, edit_comment, primary_for_enterprise)` (LU) (temporal)
16-
- Key FKs: data_source_id, edit_by_user_id, enterprise_id, foreign_participation_id, image_id, legal_form_id, sector_id, status_id, unit_size_id.
1711
- `establishment(id, short_name, name, sector_id, status_id, edit_by_user_id, unit_size_id, data_source_id, enterprise_id, legal_unit_id, image_id, valid_range, valid_from, valid_to, valid_until, edit_at, birth_date, death_date, free_econ_zone, edit_comment, primary_for_legal_unit, primary_for_enterprise)` (EST) (temporal)
1812
- Key FKs: data_source_id, edit_by_user_id, enterprise_id, image_id, legal_unit_id, sector_id, status_id, unit_size_id, valid_range.
13+
- `legal_unit(id, short_name, name, sector_id, status_id, legal_form_id, edit_by_user_id, unit_size_id, foreign_participation_id, data_source_id, enterprise_id, image_id, valid_range, valid_from, valid_to, valid_until, edit_at, birth_date, death_date, free_econ_zone, edit_comment, primary_for_enterprise)` (LU) (temporal)
14+
- Key FKs: data_source_id, edit_by_user_id, enterprise_id, foreign_participation_id, image_id, legal_form_id, sector_id, status_id, unit_size_id.
15+
- `enterprise(id, short_name, edit_by_user_id, edit_at, enabled, edit_comment)` (EN)
16+
- Key FKs: edit_by_user_id.
17+
- `power_group(id, ident, short_name, name, type_id, unit_size_id, data_source_id, foreign_participation_id, edit_by_user_id, edit_at, contact_person, edit_comment)` (PG)
18+
- Key FKs: data_source_id, edit_by_user_id, foreign_participation_id, type_id, unit_size_id.
19+
20+
## Legal Unit Ownership & Control
21+
Tables and views for tracking ownership/control relationships between legal units:
22+
23+
- `legal_relationship(id, type_id, reorg_type_id, power_group_id, influencing_id, influenced_id, edit_by_user_id, valid_range, valid_from, valid_to, valid_until, edit_at, primary_influencer_only, percentage, edit_comment)` (temporal)
24+
- Key FKs: edit_by_user_id, influenced_id, influencing_id, power_group_id, primary_influencer_only, reorg_type_id, type_id, type_id, valid_range, valid_range.
25+
- `legal_unit_power_hierarchy(path, legal_unit_id, root_legal_unit_id, valid_range, power_level, is_cycle)`
26+
- `power_group_def(root_legal_unit_id, depth, width, reach)`
27+
- `legal_relationship_cluster(legal_relationship_id, root_legal_unit_id)`
28+
- `power_group_active(id, ident, short_name, name, type_id)`
29+
- `power_group_membership(power_group_ident, power_group_id, legal_unit_id, valid_range, power_level)`
1930

20-
## Common Links for Core Units (EG, EN, LU, EST)
31+
## Common Links for Core Units (PG, EN, LU, EST)
2132
These tables link to any of the four core statistical units:
2233

23-
- `external_ident(id, type_id, ident, idents, labels, establishment_id, legal_unit_id, enterprise_id, enterprise_group_id, edit_by_user_id, edit_at, shape, edit_comment)`
34+
- `external_ident(id, type_id, ident, idents, labels, establishment_id, legal_unit_id, enterprise_id, power_group_id, edit_by_user_id, edit_at, shape, edit_comment)`
2435
- Key FKs: edit_by_user_id, enterprise_id, type_id.
2536
- Enums: `shape` (`public.external_ident_shape`).
2637
- `image(id, type, uploaded_by_user_id, uploaded_at, data)`
2738
- Key FKs: uploaded_by_user_id.
28-
- `tag_for_unit(id, tag_id, establishment_id, legal_unit_id, enterprise_id, enterprise_group_id, edit_by_user_id, created_at, edit_at, edit_comment)`
39+
- `tag_for_unit(id, tag_id, establishment_id, legal_unit_id, enterprise_id, power_group_id, edit_by_user_id, created_at, edit_at, edit_comment)`
2940
- Key FKs: edit_by_user_id, enterprise_id, tag_id.
30-
- `unit_notes(id, notes, establishment_id, legal_unit_id, enterprise_id, enterprise_group_id, edit_by_user_id, created_at, edit_at, edit_comment)`
41+
- `unit_notes(id, notes, establishment_id, legal_unit_id, enterprise_id, power_group_id, edit_by_user_id, created_at, edit_at, edit_comment)`
3142
- Key FKs: edit_by_user_id, enterprise_id.
3243
- `enterprise_external_idents(unit_type, external_idents, unit_id, valid_from, valid_to, valid_until)` (temporal)
3344
- Enums: `unit_type` (`public.statistical_unit_type`).
@@ -79,13 +90,13 @@ These tables link to any of the four core statistical units:
7990
These tables typically store codes, names, and flags for `custom` and `enabled` status.
8091

8192
- `data_source(id, code, name, created_at, updated_at, enabled, custom)`
82-
- `enterprise_group_type(id, code, name, created_at, updated_at, enabled, custom)`
83-
- `enterprise_group_role(id, code, name, created_at, updated_at, enabled, custom)`
93+
- `power_group_type(id, code, name, created_at, updated_at, enabled, custom)`
8494
- `external_ident_type(id, code, name, labels, enabled, shape, description, priority)`
8595
- Enums: `shape` (`public.external_ident_shape`).
8696
- `foreign_participation(id, code, name, created_at, updated_at, enabled, custom)`
8797
- `legal_form(id, code, name, created_at, updated_at, enabled, custom)`
88-
- `reorg_type(id, code, name, created_at, updated_at, enabled, description, custom)`
98+
- `legal_reorg_type(id, code, name, created_at, updated_at, enabled, description, custom)`
99+
- `legal_rel_type(id, code, name, created_at, updated_at, enabled, description, primary_influencer_only, custom)`
89100
- `sector(id, path, label, code, name, parent_id, created_at, updated_at, enabled, description, custom)`
90101
- `status(id, code, name, created_at, updated_at, enabled, assigned_by_default, used_for_counting, priority, custom)`
91102
- `tag(id, path, label, code, name, type, parent_id, context_valid_on, created_at, updated_at, enabled, level, description, context_valid_from, context_valid_to, context_valid_until)`
@@ -106,7 +117,7 @@ Enumerated types used across the schema, with their possible values.
106117
- **`public.import_data_column_purpose`**: `source_input`, `internal`, `pk_id`, `metadata`
107118
- **`public.import_data_state`**: `pending`, `analysing`, `analysed`, `processing`, `processed`, `error`
108119
- **`public.import_job_state`**: `waiting_for_upload`, `upload_completed`, `preparing_data`, `analysing_data`, `waiting_for_review`, `approved`, `rejected`, `processing_data`, `failed`, `finished`
109-
- **`public.import_mode`**: `legal_unit`, `establishment_formal`, `establishment_informal`, `generic_unit`
120+
- **`public.import_mode`**: `legal_unit`, `establishment_formal`, `establishment_informal`, `generic_unit`, `legal_relationship`
110121
- **`public.import_row_action_type`**: `use`, `skip`
111122
- **`public.import_row_operation_type`**: `insert`, `replace`, `update`
112123
- **`public.import_source_expression`**: `now`, `default`
@@ -121,7 +132,7 @@ Enumerated types used across the schema, with their possible values.
121132
- **`public.stat_frequency`**: `daily`, `weekly`, `biweekly`, `monthly`, `bimonthly`, `quarterly`, `semesterly`, `yearly`
122133
- **`public.stat_type`**: `int`, `float`, `string`, `bool`
123134
- **`public.statbus_role`**: `admin_user`, `regular_user`, `restricted_user`, `external_user`
124-
- **`public.statistical_unit_type`**: `establishment`, `legal_unit`, `enterprise`, `enterprise_group`
135+
- **`public.statistical_unit_type`**: `establishment`, `legal_unit`, `enterprise`, `power_group`
125136
- **`public.tag_type`**: `custom`, `system`
126137
- **`public.time_context_type`**: `relative_period`, `tag`, `year`
127138
- **`worker.pipeline_phase`**: `is_deriving_statistical_units`, `is_deriving_reports`
@@ -199,7 +210,7 @@ Handles background processing. A long-running worker process calls `worker.proce
199210
- Key FKs: queue.
200211
- Enums: `phase` (`worker.pipeline_phase`).
201212
- `queue_registry(queue, description, default_concurrency)`
202-
- `pipeline_progress(updated_at, phase, step, total, completed, affected_establishment_count, affected_legal_unit_count, affected_enterprise_count)`
213+
- `pipeline_progress(updated_at, phase, step, total, completed, affected_establishment_count, affected_legal_unit_count, affected_enterprise_count, affected_power_group_count)`
203214
- Enums: `phase` (`worker.pipeline_phase`).
204215
- `base_change_log(establishment_ids, legal_unit_ids, enterprise_ids, edited_by_valid_range)`
205216
- `base_change_log_has_pending(has_pending)`

0 commit comments

Comments
 (0)