|
| 1 | +# OpenCRVS Analytics Documentation |
| 2 | + |
| 3 | +<!-- START doctoc generated TOC please keep comment here to allow auto update --> |
| 4 | +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> |
| 5 | +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* |
| 6 | + |
| 7 | +- [Overview](#overview) |
| 8 | +- [Architecture](#architecture) |
| 9 | +- [Data Sources](#data-sources) |
| 10 | +- [How Data Flows to Metabase](#how-data-flows-to-metabase) |
| 11 | +- [Available Data in Example Setup](#available-data-in-example-setup) |
| 12 | +- [Deployment](#deployment) |
| 13 | +- [Development and Dashboard Management](#development-and-dashboard-management) |
| 14 | +- [Why Analytics Data is Not Backed Up](#why-analytics-data-is-not-backed-up) |
| 15 | +- [Configuration](#configuration) |
| 16 | +- [Development Commands](#development-commands) |
| 17 | + |
| 18 | +<!-- END doctoc generated TOC please keep comment here to allow auto update --> |
| 19 | + |
| 20 | +## Overview |
| 21 | + |
| 22 | +OpenCRVS provides comprehensive analytics capabilities through **Metabase**, an open-source business intelligence platform. This system allows stakeholders to visualize and analyze civil registration data through interactive dashboards, charts, and reports. |
| 23 | + |
| 24 | +The analytics system is designed to: |
| 25 | +- Track vital events (births, deaths, etc.) and registration statistics |
| 26 | +- Provide insights for decision-making and reporting |
| 27 | +- Support data-driven improvements to civil registration processes |
| 28 | +- Export non-PII datasets as CSV |
| 29 | + |
| 30 | +## Architecture |
| 31 | + |
| 32 | +The analytics system consists of several key components: |
| 33 | + |
| 34 | +``` |
| 35 | +┌──────────────────┐ ┌──────────────────┐ ┌─────────────────┐ |
| 36 | +│ OpenCRVS Core │───▶│ Country config │───▶│ Metabase │ |
| 37 | +│ Action trigger │ │ Analytics DB │ │ Dashboards │ |
| 38 | +└──────────────────┘ └──────────────────┘ └─────────────────┘ |
| 39 | +``` |
| 40 | + |
| 41 | +1. **OpenCRVS Core Services**: Submits all performed event actions to country config |
| 42 | +2. **PostgreSQL Analytics Database**: Stores processed analytics data in country-defined database. By default this is in PostgreSQL in `analytics` schema |
| 43 | +3. **Metabase**: Connects to PostgreSQL and provides visualization capabilities |
| 44 | + |
| 45 | +## Data Sources |
| 46 | + |
| 47 | +OpenCRVS collects analytics data from multiple sources: |
| 48 | + |
| 49 | +### Event Data |
| 50 | +- **Birth registrations**: Demographics, locations, registration timing |
| 51 | +- **Death registrations**: Cause of death, demographics, locations |
| 52 | +- **Custom events**: Additional event types (e.g., tennis club memberships in the example) |
| 53 | +- **Registration status**: Draft, registered, certified states |
| 54 | +- **Action history**: Workflow actions and state transitions |
| 55 | + |
| 56 | +Country config receives full event documents and can decide which fields should be written into the analytics database and which are PII. |
| 57 | + |
| 58 | +### Configuration-Driven Analytics |
| 59 | +The example country config implementation uses configurable analytics fields marked with `analytics: true` in form configurations: |
| 60 | + |
| 61 | +```typescript |
| 62 | +// Only fields marked with analytics: true are included |
| 63 | +field: { |
| 64 | + id: 'childDetails.firstName', |
| 65 | + analytics: true, // This field will be tracked |
| 66 | + // ... other field properties |
| 67 | +} |
| 68 | +``` |
| 69 | + |
| 70 | +## How Data Flows to Metabase |
| 71 | + |
| 72 | +1. **Event Processing**: When an event receives an action in OpenCRVS (registration, status changes, etc.), country config HTTP action hooks are triggered and the system processes the action through the analytics pipeline defined in `src/analytics/analytics.ts` |
| 73 | + |
| 74 | +2. **Data Extraction**: The analytics service extracts relevant fields from event documents based on form configuration (only fields marked with `analytics: true`) |
| 75 | + |
| 76 | +3. **Data Transformation**: Raw event data is transformed and enriched with additional calculated metrics (e.g., registration delays, demographic statistics). |
| 77 | + |
| 78 | +4. **Database Storage**: Processed data is stored in PostgreSQL in the `analytics` schema, specifically in the `event_actions` table |
| 79 | + |
| 80 | +5. **Metabase Connection**: Metabase connects directly to the PostgreSQL database using configured credentials and queries the analytics schema |
| 81 | + |
| 82 | +6. **Dashboard Rendering**: Pre-configured dashboards and queries in Metabase visualize the data through charts, tables, and maps |
| 83 | + |
| 84 | +## Deployment |
| 85 | + |
| 86 | +### Production Deployment |
| 87 | +Metabase is deployed as a Docker service defined in `infrastructure/docker-compose.deploy.yml`: |
| 88 | + |
| 89 | +```yaml |
| 90 | +dashboards: |
| 91 | + image: metabase/metabase:v0.56.4 |
| 92 | + volumes: |
| 93 | + - /opt/opencrvs/infrastructure/metabase/metabase.init.db.sql:/metabase.init.db.sql |
| 94 | + - /opt/opencrvs/infrastructure/metabase/run.sh:/run.sh |
| 95 | + # ... other configuration files |
| 96 | + environment: |
| 97 | + # Database connection settings |
| 98 | + - METABASE_DATABASE_HOST=${METABASE_DATABASE_HOST:-postgres} |
| 99 | + - METABASE_DATABASE_NAME=${METABASE_DATABASE_NAME:-events} |
| 100 | + - METABASE_DATABASE_USER=${METABASE_DATABASE_USER:-events_analytics} |
| 101 | + # ... other environment variables |
| 102 | +``` |
| 103 | + |
| 104 | +### Environment Configuration |
| 105 | +The deployment uses environment-specific configurations: |
| 106 | +- `infrastructure/metabase/environment-configuration.sql`: Database connections and admin users. **This file does not need to be changed**. |
| 107 | +- Environment variables for database credentials and site settings |
| 108 | +- Map configurations for geographic visualizations |
| 109 | + |
| 110 | +## Development and Dashboard Management |
| 111 | + |
| 112 | +### ⚠️ Critical: Development vs Production Changes |
| 113 | + |
| 114 | +**Important**: Changes made directly in deployed Metabase environments (staging, production) **WILL NOT PERSIST** and will be reset during the next deployment. |
| 115 | + |
| 116 | +To make persistent dashboard changes: |
| 117 | + |
| 118 | +1. **Work in Development Mode**: Always make changes in your local development environment first |
| 119 | +2. **Modify the Source**: Update `infrastructure/metabase/metabase.init.db.sql` by running and using Metabase locally. When you stop the local metabase, all Metabase configuration is stored to this file. |
| 120 | +3. **Version Control**: Commit changes to ensure they're deployed to all environments |
| 121 | +4. **Deploy**: Changes will only persist across environments when included in the initialization database |
| 122 | + |
| 123 | +### Development Workflow |
| 124 | + |
| 125 | +```bash |
| 126 | +# Start Metabase in development mode |
| 127 | +yarn metabase |
| 128 | + |
| 129 | +# Access at http://localhost:4444 |
| 130 | +# Default credentials: |
| 131 | + |
| 132 | +# Password: m3tabase |
| 133 | +``` |
| 134 | + |
| 135 | +### Making Dashboard Changes |
| 136 | + |
| 137 | +1. **Create/modify dashboards** in development Metabase UI |
| 138 | +2. **Export the changes** by updating stopping the process |
| 139 | +3. **Commit changes** to version control |
| 140 | +4. **Deploy** to propagate changes to all environments |
| 141 | + |
| 142 | +### Development Commands |
| 143 | + |
| 144 | +```bash |
| 145 | +# Start Metabase in development |
| 146 | +yarn metabase |
| 147 | + |
| 148 | +# Clear all analytics data |
| 149 | +yarn db:clear:all |
| 150 | +``` |
| 151 | +## Why Analytics Data is Not Backed Up |
| 152 | + |
| 153 | +Analytics data in OpenCRVS is **intentionally not included in backup procedures** for several important reasons: |
| 154 | + |
| 155 | +### 1. **Regenerative Nature** |
| 156 | +Analytics data can be completely regenerated from the primary data sources. The analytics tables are derived views of the operational data, not the source of truth. |
| 157 | + |
| 158 | +### 2. **Performance Considerations** |
| 159 | +- Analytics databases can be very large (GB to TB scale) |
| 160 | +- Including them in backups would significantly increase backup time and storage requirements |
| 161 | +- Restore operations would be much slower |
| 162 | + |
| 163 | +## Configuration |
| 164 | + |
| 165 | +### Database Connection |
| 166 | +Metabase connects to PostgreSQL using these environment variables: |
| 167 | +- `METABASE_DATABASE_HOST`: Database host (default: postgres) |
| 168 | +- `METABASE_DATABASE_PORT`: Database port (default: 5432) |
| 169 | +- `METABASE_DATABASE_NAME`: Database name (default: events) |
| 170 | +- `METABASE_DATABASE_USER`: Database user (default: events_analytics) |
| 171 | +- `METABASE_DATABASE_PASSWORD`: Database password |
| 172 | + |
| 173 | +### Analytics Schema |
| 174 | +The analytics data is stored in the PostgreSQL `analytics` schema, primarily in: |
| 175 | +- `analytics.event_actions`: Main table containing processed event data and action histories |
| 176 | + |
| 177 | +### Map Visualizations |
| 178 | +Geographic visualizations use: |
| 179 | +- `OPENCRVS_METABASE_MAP_NAME`: Map display name |
| 180 | +- `OPENCRVS_METABASE_MAP_URL`: GeoJSON source URL |
| 181 | +- `OPENCRVS_METABASE_MAP_REGION_KEY`: Key field for geographic regions |
| 182 | +- `OPENCRVS_METABASE_MAP_REGION_NAME`: Display name for regions |
0 commit comments