Skip to content

Commit a1f6118

Browse files
committed
Add filesystem constraints to copilot instructions
1 parent 0eb4fa0 commit a1f6118

1 file changed

Lines changed: 29 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,3 +130,32 @@ All NuGet package versions are centrally managed in `Directory.Packages.props`
130130
### Aspire Configuration
131131

132132
`Parameters:startCoreServices` (bool) controls whether RabbitMQ, MinIO, and all worker services are started — set to `false` to run API + Visualiser only. `Parameters:seedData` triggers `FakeDataSeeder` on startup.
133+
134+
### Filesystem Layout & Constraints
135+
136+
All file I/O is rooted at `DMSFilesBasePath` (from config, `~` is expanded to the user profile). The `IFileLocations` / `FileLocations` abstraction (registered by `AddDmsFileLocations()`) exposes four derived paths:
137+
138+
| Property | Path |
139+
|----------|------|
140+
| `deliusInput` | `{basePath}/Delius/Input/` |
141+
| `deliusOutput` | `{basePath}/Delius/Output/` |
142+
| `offlocInput` | `{basePath}/Offloc/Input/` |
143+
| `offlocOutput` | `{basePath}/Offloc/Output/` |
144+
145+
**File naming patterns** (enforced by `FileConstants` / `FilePatterns`):
146+
147+
| File type | Regex |
148+
|-----------|-------|
149+
| Delius extract | `cfoextract_\d{1,5}_(full\|diff)_\d{14}\.txt` |
150+
| Offloc data | `C_NOMIS_OFFENDER_\d{8}_.+\.dat` |
151+
| Offloc archive | `\d{8}\.zip` |
152+
153+
**Processing flow:**
154+
155+
1. `FileSync` downloads raw files into `*Input/` and publishes trigger messages.
156+
2. `Offloc.Cleaner` / `Delius.Parser` parse files and write pipe-delimited (`|`) output files with CRLF line endings into `*Output/{fileNameWithoutExtension}/`, one `.txt` file per entity (e.g. `Offenders.txt`, `EventDetails.txt`).
157+
3. `DbInteractions` calls the staging stored procedures (`DeliusStaging.StageDelius`, `OfflocStaging.Import`) which execute dynamic SQL `BULK INSERT` statements that read these output files **directly from disk into SQL Server**.
158+
159+
**Critical BULK INSERT constraint:** because SQL Server's `BULK INSERT` resolves file paths from SQL Server's own process perspective, the parsed output files in `*Output/` **must be on a path that SQL Server can access directly**. The `RUNNING_IN_CONTAINER` config flag exists in `DbInteractionService` but is currently unused. Never move staging output files or change the path format without ensuring SQL Server can still reach them.
160+
161+
**Matching/Clustering bulk inserts** (in `MatchingRepository` and `ClusteringRepository`) are different — they use batched parameterised Dapper `ExecuteAsync` calls (batch size 1000, concurrency 16), not `BULK INSERT`, so they have no filesystem dependency.

0 commit comments

Comments
 (0)