Skip to content

Commit 0eb4fa0

Browse files
committed
Add copilot instructions
1 parent 2c49f03 commit 0eb4fa0

1 file changed

Lines changed: 132 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# GitHub Copilot Instructions
2+
3+
## Project Overview
4+
5+
HMPPS CFO Data Management System (DMS) — a .NET 10 distributed microservices application that processes p-NOMIS (Offloc) and nDelius offender data to supply CATS (Case Assessment and Tracking System) with accurate offender records. Targets Windows Server EC2 (deployed as Windows Services) and runs locally via .NET Aspire.
6+
7+
## Build, Test & Run
8+
9+
```bash
10+
# Build the solution
11+
dotnet build
12+
13+
# Run all tests
14+
dotnet test --configuration Release
15+
16+
# Run tests for a specific project
17+
dotnet test tests/Matching.Engine.Tests/Matching.Engine.Tests.csproj --configuration Release
18+
19+
# Run a single test
20+
dotnet test tests/Api.Tests/Api.Tests.csproj --filter "FullyQualifiedName~SearchAsync_WithNoMatches_ReturnsNotFound"
21+
22+
# Run locally via Aspire (recommended — starts all services + dependencies)
23+
# VS Code: F5 → Default Configuration
24+
# Visual Studio: select "Aspire.AppHost" debug config
25+
26+
# Deploy databases to a test environment
27+
export SERVER="..." DB_USER="..." DB_PASS="..."
28+
python3 publish_db.py # deploy
29+
python3 publish_db.py --dry-run # preview only
30+
31+
# Seed test data
32+
dotnet run --project ./src/FakeDataSeeder/FakeDataSeeder.csproj
33+
```
34+
35+
## Architecture
36+
37+
### Data Pipeline
38+
39+
```
40+
FileSync → [Offloc.Cleaner → Offloc.Parser | Delius.Parser] → Import → DbInteractions → Blocking → Matching.Engine → API / Visualiser
41+
```
42+
43+
All inter-service communication is **asynchronous via RabbitMQ** using the [Rebus](https://github.com/rebus-org/Rebus) library. Each stage publishes a `*FinishedMessage` that triggers the next stage.
44+
45+
### Services
46+
47+
| Service | Type | Role |
48+
|---------|------|------|
49+
| `FileSync` | Worker | Monitors MinIO/S3/filesystem for incoming files |
50+
| `Offloc.Cleaner` | Worker | Cleans raw Offloc (p-NOMIS) files |
51+
| `Offloc.Parser` | Worker | Parses cleaned Offloc files into DB records |
52+
| `Delius.Parser` | Worker | Parses nDelius files into DB records |
53+
| `Import` | Worker | Coordinates staging → running picture migration |
54+
| `DbInteractions` | Worker | Executes DB staging/merge operations (runs in SQL container) |
55+
| `Blocking` | Worker | Generates candidate record pairs for matching |
56+
| `Matching.Engine` | Worker | Compares pairs (Comparator), scores (Scorer, Bayesian), clusters |
57+
| `Cleanup` | Worker | Data maintenance |
58+
| `Logging` | Worker | Centralised log aggregation |
59+
| `Meow` | Worker | CATS RabbitMQ integration (different broker config) |
60+
| `API` | ASP.NET Core | REST endpoints for downstream consumers |
61+
| `Visualiser` | ASP.NET Core | Blazor web UI for exploring offender relationships |
62+
63+
### Databases (SQL Server)
64+
65+
Seven separate databases: `OfflocStagingDb`, `OfflocRunningPictureDb`, `DeliusStagingDb`, `DeliusRunningPictureDb`, `MatchingDb`, `ClusterDb`, `AuditDb`. Database schemas are managed as SQL Database Projects under `src/Database/`.
66+
67+
### Shared Libraries (`src/Libraries/`)
68+
69+
- **`Messaging`** — RabbitMQ integration via Rebus; all message types; `Exchanges` constants; `AddDmsRabbitMQ()` extension
70+
- **`Infrastructure`** — EF Core `DbContext`s (`OfflocContext`, `DeliusContext`, `ClusteringContext`, `AuditContext`), entity models, repositories, shared DTOs
71+
- **`Matching.Core`**`IMatcher<T, Result>` interface and concrete matchers (Jaro-Winkler, Levenshtein, Caver, Date, Postcode, Equality); `[Matcher("key")]` attribute for dynamic discovery
72+
- **`EnvironmentSetup`**`AddDmsCoreWorkerService()` and `UseDmsSerilog()` extension methods shared by all worker services; `FileLocations` / `FilePatterns`
73+
74+
## Key Conventions
75+
76+
### Service Bootstrap Pattern
77+
78+
All worker services follow the same bootstrap pattern in `Program.cs`:
79+
80+
```csharp
81+
var builder = Host.CreateApplicationBuilder(args);
82+
builder.AddDmsCoreWorkerService(); // Serilog + Windows Service + file locations
83+
builder.Services.AddDmsRabbitMQ(builder.Configuration);
84+
// ... register additional services
85+
var app = builder.Build();
86+
await app.RunAsync();
87+
```
88+
89+
`Meow` and `API`/`Visualiser` are exceptions — they configure messaging or hosting differently.
90+
91+
### Messaging
92+
93+
- All messages implement `IMessage` from `Messaging.Messages`
94+
- Messages are grouped by pipeline stage: `BlockingMessages`, `DbMessages`, `ImportMessages`, `MatchingMessages`, `StagingMessages`, `MergingMessages`, `StatusMessages`
95+
- Exchange names are string constants in `Messaging.Exchanges` (lowercase: `staging`, `merging`, `database`, etc.)
96+
- RabbitMQ connection string is pulled from `ConnectionStrings:RabbitMQ` in config
97+
98+
### Dependency Injection
99+
100+
- Worker services use standard `Microsoft.Extensions.DependencyInjection`
101+
- `Matching.Engine` additionally uses **Autofac** (via `AutofacServiceProviderFactory`) for registering matchers and scorers dynamically via reflection
102+
103+
### Matching Engine
104+
105+
- `[Matcher("key")]` attribute decorates matcher classes for dynamic registration
106+
- Three hosted services run in parallel within one process: `ComparatorService`, `ScorerService`, `ClusteringService`
107+
- `MatchingQueue` is a singleton in-memory queue between comparator and scorer
108+
- Scoring uses Bayesian probability; matchers include string similarity algorithms (Jaro-Winkler, Levenshtein) and phonetic matching (Caver/Soundex)
109+
110+
### API Authentication
111+
112+
The API supports two auth schemes via a `"Smart"` policy scheme:
113+
- **JWT Bearer** (Entra ID / Microsoft Identity) — used for `dms.read`, `dms.write`, `visualiser.read`, `visualiser.write` scopes
114+
- **Legacy API Key** (`X-API-KEY` header) — for backward compatibility
115+
116+
Swagger UI is only enabled when `IsDevelopment=true` (passed via Aspire parameter).
117+
118+
### Package Management
119+
120+
All NuGet package versions are centrally managed in `Directory.Packages.props` — never specify versions in individual `.csproj` files.
121+
122+
### Test Patterns
123+
124+
- Framework: **xunit** with `EF Core InMemory` for repository/endpoint tests
125+
- Tests use `IDisposable` to call `context.Database.EnsureDeleted()` in teardown
126+
- Each test creates a database with a unique name (`$"TestDb_{Guid.NewGuid()}"`) to avoid cross-test contamination
127+
- Integration tests for messaging use **Testcontainers** (RabbitMQ)
128+
- Arrange/Act/Assert comment blocks are used consistently
129+
130+
### Aspire Configuration
131+
132+
`Parameters:startCoreServices` (bool) controls whether RabbitMQ, MinIO, and all worker services are started — set to `false` to run API + Visualiser only. `Parameters:seedData` triggers `FakeDataSeeder` on startup.

0 commit comments

Comments
 (0)