Skip to content

Commit be5bde4

Browse files
authored
Add copilot instructions (#99)
* Add copilot instructions * Add filesystem constraints to copilot instructions
1 parent 7e433ad commit be5bde4

1 file changed

Lines changed: 161 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# GitHub Copilot Instructions
2+
3+
## Project Overview
4+
5+
HMPPS CFO Data Management System (DMS) — a .NET 10 distributed microservices application that processes p-NOMIS (Offloc) and nDelius offender data to supply CATS (Case Assessment and Tracking System) with accurate offender records. Targets Windows Server EC2 (deployed as Windows Services) and runs locally via .NET Aspire.
6+
7+
## Build, Test & Run
8+
9+
```bash
10+
# Build the solution
11+
dotnet build
12+
13+
# Run all tests
14+
dotnet test --configuration Release
15+
16+
# Run tests for a specific project
17+
dotnet test tests/Matching.Engine.Tests/Matching.Engine.Tests.csproj --configuration Release
18+
19+
# Run a single test
20+
dotnet test tests/Api.Tests/Api.Tests.csproj --filter "FullyQualifiedName~SearchAsync_WithNoMatches_ReturnsNotFound"
21+
22+
# Run locally via Aspire (recommended — starts all services + dependencies)
23+
# VS Code: F5 → Default Configuration
24+
# Visual Studio: select "Aspire.AppHost" debug config
25+
26+
# Deploy databases to a test environment
27+
export SERVER="..." DB_USER="..." DB_PASS="..."
28+
python3 publish_db.py # deploy
29+
python3 publish_db.py --dry-run # preview only
30+
31+
# Seed test data
32+
dotnet run --project ./src/FakeDataSeeder/FakeDataSeeder.csproj
33+
```
34+
35+
## Architecture
36+
37+
### Data Pipeline
38+
39+
```
40+
FileSync → [Offloc.Cleaner → Offloc.Parser | Delius.Parser] → Import → DbInteractions → Blocking → Matching.Engine → API / Visualiser
41+
```
42+
43+
All inter-service communication is **asynchronous via RabbitMQ** using the [Rebus](https://github.com/rebus-org/Rebus) library. Each stage publishes a `*FinishedMessage` that triggers the next stage.
44+
45+
### Services
46+
47+
| Service | Type | Role |
48+
|---------|------|------|
49+
| `FileSync` | Worker | Monitors MinIO/S3/filesystem for incoming files |
50+
| `Offloc.Cleaner` | Worker | Cleans raw Offloc (p-NOMIS) files |
51+
| `Offloc.Parser` | Worker | Parses cleaned Offloc files into DB records |
52+
| `Delius.Parser` | Worker | Parses nDelius files into DB records |
53+
| `Import` | Worker | Coordinates staging → running picture migration |
54+
| `DbInteractions` | Worker | Executes DB staging/merge operations (runs in SQL container) |
55+
| `Blocking` | Worker | Generates candidate record pairs for matching |
56+
| `Matching.Engine` | Worker | Compares pairs (Comparator), scores (Scorer, Bayesian), clusters |
57+
| `Cleanup` | Worker | Data maintenance |
58+
| `Logging` | Worker | Centralised log aggregation |
59+
| `Meow` | Worker | CATS RabbitMQ integration (different broker config) |
60+
| `API` | ASP.NET Core | REST endpoints for downstream consumers |
61+
| `Visualiser` | ASP.NET Core | Blazor web UI for exploring offender relationships |
62+
63+
### Databases (SQL Server)
64+
65+
Seven separate databases: `OfflocStagingDb`, `OfflocRunningPictureDb`, `DeliusStagingDb`, `DeliusRunningPictureDb`, `MatchingDb`, `ClusterDb`, `AuditDb`. Database schemas are managed as SQL Database Projects under `src/Database/`.
66+
67+
### Shared Libraries (`src/Libraries/`)
68+
69+
- **`Messaging`** — RabbitMQ integration via Rebus; all message types; `Exchanges` constants; `AddDmsRabbitMQ()` extension
70+
- **`Infrastructure`** — EF Core `DbContext`s (`OfflocContext`, `DeliusContext`, `ClusteringContext`, `AuditContext`), entity models, repositories, shared DTOs
71+
- **`Matching.Core`**`IMatcher<T, Result>` interface and concrete matchers (Jaro-Winkler, Levenshtein, Caver, Date, Postcode, Equality); `[Matcher("key")]` attribute for dynamic discovery
72+
- **`EnvironmentSetup`**`AddDmsCoreWorkerService()` and `UseDmsSerilog()` extension methods shared by all worker services; `FileLocations` / `FilePatterns`
73+
74+
## Key Conventions
75+
76+
### Service Bootstrap Pattern
77+
78+
All worker services follow the same bootstrap pattern in `Program.cs`:
79+
80+
```csharp
81+
var builder = Host.CreateApplicationBuilder(args);
82+
builder.AddDmsCoreWorkerService(); // Serilog + Windows Service + file locations
83+
builder.Services.AddDmsRabbitMQ(builder.Configuration);
84+
// ... register additional services
85+
var app = builder.Build();
86+
await app.RunAsync();
87+
```
88+
89+
`Meow` and `API`/`Visualiser` are exceptions — they configure messaging or hosting differently.
90+
91+
### Messaging
92+
93+
- All messages implement `IMessage` from `Messaging.Messages`
94+
- Messages are grouped by pipeline stage: `BlockingMessages`, `DbMessages`, `ImportMessages`, `MatchingMessages`, `StagingMessages`, `MergingMessages`, `StatusMessages`
95+
- Exchange names are string constants in `Messaging.Exchanges` (lowercase: `staging`, `merging`, `database`, etc.)
96+
- RabbitMQ connection string is pulled from `ConnectionStrings:RabbitMQ` in config
97+
98+
### Dependency Injection
99+
100+
- Worker services use standard `Microsoft.Extensions.DependencyInjection`
101+
- `Matching.Engine` additionally uses **Autofac** (via `AutofacServiceProviderFactory`) for registering matchers and scorers dynamically via reflection
102+
103+
### Matching Engine
104+
105+
- `[Matcher("key")]` attribute decorates matcher classes for dynamic registration
106+
- Three hosted services run in parallel within one process: `ComparatorService`, `ScorerService`, `ClusteringService`
107+
- `MatchingQueue` is a singleton in-memory queue between comparator and scorer
108+
- Scoring uses Bayesian probability; matchers include string similarity algorithms (Jaro-Winkler, Levenshtein) and phonetic matching (Caver/Soundex)
109+
110+
### API Authentication
111+
112+
The API supports two auth schemes via a `"Smart"` policy scheme:
113+
- **JWT Bearer** (Entra ID / Microsoft Identity) — used for `dms.read`, `dms.write`, `visualiser.read`, `visualiser.write` scopes
114+
- **Legacy API Key** (`X-API-KEY` header) — for backward compatibility
115+
116+
Swagger UI is only enabled when `IsDevelopment=true` (passed via Aspire parameter).
117+
118+
### Package Management
119+
120+
All NuGet package versions are centrally managed in `Directory.Packages.props` — never specify versions in individual `.csproj` files.
121+
122+
### Test Patterns
123+
124+
- Framework: **xunit** with `EF Core InMemory` for repository/endpoint tests
125+
- Tests use `IDisposable` to call `context.Database.EnsureDeleted()` in teardown
126+
- Each test creates a database with a unique name (`$"TestDb_{Guid.NewGuid()}"`) to avoid cross-test contamination
127+
- Integration tests for messaging use **Testcontainers** (RabbitMQ)
128+
- Arrange/Act/Assert comment blocks are used consistently
129+
130+
### Aspire Configuration
131+
132+
`Parameters:startCoreServices` (bool) controls whether RabbitMQ, MinIO, and all worker services are started — set to `false` to run API + Visualiser only. `Parameters:seedData` triggers `FakeDataSeeder` on startup.
133+
134+
### Filesystem Layout & Constraints
135+
136+
All file I/O is rooted at `DMSFilesBasePath` (from config, `~` is expanded to the user profile). The `IFileLocations` / `FileLocations` abstraction (registered by `AddDmsFileLocations()`) exposes four derived paths:
137+
138+
| Property | Path |
139+
|----------|------|
140+
| `deliusInput` | `{basePath}/Delius/Input/` |
141+
| `deliusOutput` | `{basePath}/Delius/Output/` |
142+
| `offlocInput` | `{basePath}/Offloc/Input/` |
143+
| `offlocOutput` | `{basePath}/Offloc/Output/` |
144+
145+
**File naming patterns** (enforced by `FileConstants` / `FilePatterns`):
146+
147+
| File type | Regex |
148+
|-----------|-------|
149+
| Delius extract | `cfoextract_\d{1,5}_(full\|diff)_\d{14}\.txt` |
150+
| Offloc data | `C_NOMIS_OFFENDER_\d{8}_.+\.dat` |
151+
| Offloc archive | `\d{8}\.zip` |
152+
153+
**Processing flow:**
154+
155+
1. `FileSync` downloads raw files into `*Input/` and publishes trigger messages.
156+
2. `Offloc.Cleaner` / `Delius.Parser` parse files and write pipe-delimited (`|`) output files with CRLF line endings into `*Output/{fileNameWithoutExtension}/`, one `.txt` file per entity (e.g. `Offenders.txt`, `EventDetails.txt`).
157+
3. `DbInteractions` calls the staging stored procedures (`DeliusStaging.StageDelius`, `OfflocStaging.Import`) which execute dynamic SQL `BULK INSERT` statements that read these output files **directly from disk into SQL Server**.
158+
159+
**Critical BULK INSERT constraint:** because SQL Server's `BULK INSERT` resolves file paths from SQL Server's own process perspective, the parsed output files in `*Output/` **must be on a path that SQL Server can access directly**. The `RUNNING_IN_CONTAINER` config flag exists in `DbInteractionService` but is currently unused. Never move staging output files or change the path format without ensuring SQL Server can still reach them.
160+
161+
**Matching/Clustering bulk inserts** (in `MatchingRepository` and `ClusteringRepository`) are different — they use batched parameterised Dapper `ExecuteAsync` calls (batch size 1000, concurrency 16), not `BULK INSERT`, so they have no filesystem dependency.

0 commit comments

Comments
 (0)