You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/copilot-instructions.md
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -130,3 +130,32 @@ All NuGet package versions are centrally managed in `Directory.Packages.props`
130
130
### Aspire Configuration
131
131
132
132
`Parameters:startCoreServices` (bool) controls whether RabbitMQ, MinIO, and all worker services are started — set to `false` to run API + Visualiser only. `Parameters:seedData` triggers `FakeDataSeeder` on startup.
133
+
134
+
### Filesystem Layout & Constraints
135
+
136
+
All file I/O is rooted at `DMSFilesBasePath` (from config, `~` is expanded to the user profile). The `IFileLocations` / `FileLocations` abstraction (registered by `AddDmsFileLocations()`) exposes four derived paths:
137
+
138
+
| Property | Path |
139
+
|----------|------|
140
+
|`deliusInput`|`{basePath}/Delius/Input/`|
141
+
|`deliusOutput`|`{basePath}/Delius/Output/`|
142
+
|`offlocInput`|`{basePath}/Offloc/Input/`|
143
+
|`offlocOutput`|`{basePath}/Offloc/Output/`|
144
+
145
+
**File naming patterns** (enforced by `FileConstants` / `FilePatterns`):
1.`FileSync` downloads raw files into `*Input/` and publishes trigger messages.
156
+
2.`Offloc.Cleaner` / `Delius.Parser` parse files and write pipe-delimited (`|`) output files with CRLF line endings into `*Output/{fileNameWithoutExtension}/`, one `.txt` file per entity (e.g. `Offenders.txt`, `EventDetails.txt`).
157
+
3.`DbInteractions` calls the staging stored procedures (`DeliusStaging.StageDelius`, `OfflocStaging.Import`) which execute dynamic SQL `BULK INSERT` statements that read these output files **directly from disk into SQL Server**.
158
+
159
+
**Critical BULK INSERT constraint:** because SQL Server's `BULK INSERT` resolves file paths from SQL Server's own process perspective, the parsed output files in `*Output/`**must be on a path that SQL Server can access directly**. The `RUNNING_IN_CONTAINER` config flag exists in `DbInteractionService` but is currently unused. Never move staging output files or change the path format without ensuring SQL Server can still reach them.
160
+
161
+
**Matching/Clustering bulk inserts** (in `MatchingRepository` and `ClusteringRepository`) are different — they use batched parameterised Dapper `ExecuteAsync` calls (batch size 1000, concurrency 16), not `BULK INSERT`, so they have no filesystem dependency.
0 commit comments