HMPPS Creating Future Opportunities (CFO) - Data Management System (DMS). It is intended for internal use only and is used to process PNOMIS and NDelius offender data to supply CATS (Case Assessment and Tracking System - also used by HMPPS CFO) with accurate offender movements and updates.
CFO DMS is built as a distributed microservices architecture. Data flows through the following pipeline:
File Ingestion β Parsing/Cleaning β Staging β Import β Running Picture β Blocking/Matching β Clustering β Data Consumption
- File Ingestion - FileSync monitors MinIO/S3/FileSystem storage and syncs incoming files
- Parsing/Cleaning - Offloc.Parser, Offloc.Cleaner, Delius.Parser transform raw p-NOMIS and nDelius files into structured records
- Staging/Import/Running Picture - Import validates and migrates data from staging to running picture databases
- Blocking/Matching - Blocking generates candidate record pairs, Matching.Engine identifies and links related offender records across systems
- Clustering - Matching.Engine groups related records into clusters representing unique individuals
- Data Consumption - API exposes the processed data via REST endpoints for downstream consumers (e.g., CATS), Visualiser provides a web UI for exploring and visualising relationships between offender data
- Cleanup - Performs data maintenance tasks
- DbInteractions handles complex database operations
- Logging - Centralised logging service
Services communicate asynchronously via RabbitMQ message queues. See the Message Flow Diagram below for detailed service interactions.
- .NET 10 SDK
- Visual Studio Code users:
- (Optional) To use the Visualiser app, you must configure secret(s) for applications in the src directory:
- Visualiser.csproj β Manage User Secrets
{ "AzureAd:ClientSecret": "<ENTRA_CLIENT_SECRET>" }
- Visualiser.csproj β Manage User Secrets
If you are deploying to test DROP existing databases before continuing.
DECLARE @Databases TABLE (DbName sysname);
INSERT INTO @Databases (DbName)
VALUES
('ClusterDb'),
('DeliusRunningPictureDb'),
('DeliusStagingDb'),
('MatchingDb'),
('OfflocRunningPictureDb'),
('OfflocStagingDb'),
('AuditDb');
DECLARE @sql nvarchar(max) = N'';
SELECT @sql += '
IF EXISTS (SELECT 1 FROM sys.databases WHERE name = ''' + DbName + ''')
BEGIN
ALTER DATABASE [' + DbName + '] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE [' + DbName + '];
END;'
FROM @Databases;
EXEC sys.sp_executesql @sql;The publish_db.py script is provided for deploying database projects to test environments.
| Requirement | How to check | How to install (bash) |
|---|---|---|
| Python 3 | python3 --version |
macOS (Homebrew)brew install pythonUbuntu / Debian sudo apt update && sudo apt install -y python3 python3-pip |
| .NET SDK 8.x | dotnet --list-sdks |
macOS (Homebrew)brew install --cask dotnet-sdk@8Ubuntu / Debian sudo apt update && sudo apt install -y dotnet-sdk-8.0 |
| .NET SDK 10.x | dotnet --list-sdks |
macOS (Homebrew)brew install --cask dotnet-sdk@10Ubuntu / Debian sudo apt update && sudo apt install -y dotnet-sdk-10.0 |
| sqlpackage (dotnet tool) | dotnet tool list -g |
dotnet tool install -g microsoft.sqlpackage |
-
Set required environment variables before running:
export SERVER="your-test-sql-server-address" export DB_USER="your-database-username" export DB_PASS="your-database-password"
-
Run the script from the project root directory:
# Preview changes without deploying python3 publish_db.py --dry-run # Deploy to test environment (Release build - recommended) python3 publish_db.py # Deploy using Debug build python3 publish_db.py --config Debug
The script will build and publish all database projects (AuditDb, OfflocStagingDb, DeliusStagingDb, OfflocRunningPictureDb, DeliusRunningPictureDb, MatchingDb, ClusterDb) to the specified test server. You will be prompted to confirm before deployment begins.
If you want to seed the test data, you can run the Fake Data Seeder project.
export ConnectionStrings__ClusterDb="Server=$SERVER;Database=ClusterDb;User Id=$DB_USER;Password=$DB_PASS;TrustServerCertificate=True;"
dotnet run --project ./src/FakeDataSeeder/FakeDataSeeder.csproj; The recommended way to run and debug these apps is using .NET Aspire.
- Using Visual Studio Code: open the project and press
F5, selecting the Default Configuration. - Using Visual Studio or other IDEs: From the debug configuration dropdown, select
Aspire.AppHostand start the application.
When running via Aspire, the following services are available:
| Service | Purpose | Access | Credentials |
|---|---|---|---|
| API | REST endpoints for querying offender data, searches, and clustering operations | https://localhost:7013/swagger | API Key: password |
| MinIO | S3-compatible file storage | random port (check Aspire) | Username: minioadminPassword: minioadmin |
| MSSQL | Application databases (staging, running picture, matching, cluster) | 127.0.0.1,61749 |
Username: saPassword: P@ssword123! |
| RabbitMQ | Message broker for inter-service communication | http://localhost:15672 | Username: guestPassword: guest |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INITIAL FILE DETECTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ
β FileSync β (Monitors storage for new files)
ββββββββ¬ββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
DeliusDownloadFinishedMessage OfflocDownloadFinished
β β
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARSING & CLEANING STAGE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ βββββββββββββββββββ
β Delius.Parser β β Offloc.Cleaner β
β β β β
β (Parses Delius β β (Cleans Offloc β
β files into β β files, removes β
β structured β β redundant β
β records) β β fields) β
ββββββββββ¬ββββββββββ ββββββββββ¬βββββββββ
β β
β Sends DB requests: β
β - StartDeliusFileProcessingRequest β
β β
βΌ βΌ
DeliusParserFinishedMessage OfflocCleanerFinishedMessage
β β
β β
β βΌ
β βββββββββββββββββββ
β β Offloc.Parser β
β β β
β β (Parses cleaned β
β β Offloc files β
β β into structuredβ
β β records) β
β ββββββββββ¬βββββββββ
β β
β β Sends DB requests:
β β - StartOfflocFileProcessingRequest
β β
β βΌ
β OfflocParserFinishedMessage
β β
βββββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STAGING & IMPORT STAGE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββ
β Import β
β β
β (Coordinates β
β staging and β
β merging of β
β both data β
β sources) β
βββββββββ¬βββββββββ
β
β Sends DB requests:
β - StageDeliusRequest
β - MergeDeliusRequest
β - StageOfflocRequest
β - MergeOfflocRequest
β
βΌ
ββββββββββββββββββ
β DbInteractions β
β β
β (Stages data β
β from parsers, β
β merges into β
β running β
β picture DB) β
βββββββββ¬βββββββββ
β
β Sends responses:
β - StageDeliusResponse
β - MergeDeliusResponse
β - StageOfflocResponse
β - MergeOfflocResponse
β - DeliusFilesCleanupMessage
β - OfflocFilesCleanupMessage
β
βΌ
ImportFinishedMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MATCHING & BLOCKING STAGE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββ
β Blocking β
β β
β (Generates β
β candidate β
β pairs of β
β records that β
β may match) β
βββββββββ¬βββββββββ
β
βΌ
BlockingFinishedMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ComparatorService) β
β β
β (Compares candidate pairs β
β using matching rules to β
β identify potential matches) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
MatchingScoreCandidatesMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ScorerService) β
β β
β (Scores comparisons using β
β Bayesian probability to β
β determine match likelihood) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
MatchingScoreCandidatesFinishedMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLUSTERING STAGE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ClusteringService) β
β β
β (Pre-processes clustering: β
β prepares data for grouping) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
ClusteringPreProcessingStartedMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ComparatorService) β
β β
β (Compares outstanding edges β
β for clustering) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
MatchingScoreOutstandingEdgesMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ScorerService) β
β β
β (Scores outstanding edges) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
ClusteringPreProcessingFinishedMessage
β
β
βΌ
βββββββββββββββββββββββββββββββββ
β Matching.Engine β
β (ClusteringService) β
β β
β (Post-processes clustering: β
β groups related records into β
β clusters representing β
β unique individuals) β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
ClusteringPostProcessingFinishedMessage
β
β
βΌ
ββββββββββββββββββ
β FileSync β
β β
β (Triggers next β
β processing β
β cycle if β
β configured) β
ββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA CONSUMPTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββ
β API β
β β
β (Exposes REST β
β endpoints for β
β querying β
β processed β
β data) β
βββββββββ¬βββββββββ
β
βΌ
External Consumers
(e.g., CATS system)