Skip to content

Commit aac648c

Browse files
committed
docs: record unconstrained design
1 parent a04b333 commit aac648c

3 files changed

Lines changed: 103 additions & 17 deletions

File tree

README.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,25 +64,27 @@ flowchart LR
6464
## Prerequisites
6565

6666
- [mise](https://mise.jdx.dev/getting-started.html)
67-
68-
- [Docker](https://www.docker.com/) (for SQL Server + LocalStack)
67+
- manages .NET 10, pkl, and hk
68+
- [Docker](https://www.docker.com/)
69+
- SQL Server (Azure Edge SQL for aarch64) + LocalStack
6970

7071
## Quick Start
7172

7273
See **[docs/local-setup.md](docs/local-setup.md)** for the full setup walkthrough. In brief:
7374

7475
```shell
75-
mise i # Install all dependencies incl dotnet
76-
dotnet test # Running Tests
77-
dotnet build # Debug build
78-
dotnet publish -c Release # Native AOT binary
79-
docker-compose up -d
76+
mise i # Install .NET 10, pkl, hk
77+
mise run test # Run tests with 80% coverage gate
78+
mise run build # Debug build
79+
mise run publish # Native AOT binary
80+
docker-compose up -d # Start SQL Server + LocalStack
8081
dotnet user-secrets set "ScanEventApi:BaseUrl" "https://your-api-host" --project src/ScanEventWorker
8182
dotnet run --project src/ScanEventWorker/ScanEventWorker.csproj
8283
```
8384

8485
## Assumptions
8586

87+
0. Scan Event API pre-exists and _all_ events are retained indefinitely — if violated, see [Known Limitation 1](#known-limitations).
8688
1. ~~Events are returned ordered by `EventId` ascending~~
8789
2. ~~`EventId` is monotonically increasing — querying `FromEventId=X` reliably returns all events with ID ≥ X~~
8890
3. ~~The API returns an empty `ScanEvents` array when no more events exist (end-of-feed signal)~~
@@ -111,7 +113,21 @@ dotnet run --project src/ScanEventWorker/ScanEventWorker.csproj
111113
- the last processed event is re-fetched on every cycle to avoid missing events if IDs have gaps
112114
- the idempotent MERGE absorbs the duplicate without side effects
113115
14. The event feed starts at `EventId=1`
114-
- `ProcessingState` is seeded with `LastEventId=1` on first run d
116+
- `ProcessingState` is seeded with `LastEventId=1` on first run
117+
118+
## Known Limitations
119+
120+
1. **Event retention** (violates [Assumption 0](#assumptions))
121+
- If the API enforces a rolling retention window (e.g. 7 days), downtime longer than that window causes permanent data loss with no recovery path.
122+
2. **No API authentication/authorisation**
123+
- `ScanEventApiClient` sends unauth HTTP requests. The spec sample shows no auth header
124+
- if the production API requires one, it must be added before deployment
125+
3. **Poller cannot scale horizontally**
126+
- `ProcessingState` is a single-row table and the startup `Mutex` enforces one poller instance
127+
- Scaling the poller requires distributed locking and a different cursor model
128+
4. **DLQ is silent**
129+
- Messages that fail 3 times move to the DLQ and die there
130+
- Without monitoring they accumulate undetected
115131

116132
## Potential Improvements
117133

@@ -134,6 +150,7 @@ dotnet run --project src/ScanEventWorker/ScanEventWorker.csproj
134150
- [Infrastructure](docs/infrastructure.md)
135151
- CDK stack
136152
- downstream fan-out architecture
153+
- spec constraint: why polling exists, and the event-driven alternative
137154
- [Design Rationale](docs/design-rationale.md)
138155
- Domain model
139156
- error handling

docs/infrastructure.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,62 @@ flowchart LR
4141

4242
- **Outbox pattern** — write events to an outbox table, dedicated publisher reads and publishes to SNS/SQS. Stronger consistency guarantee but adds DB coupling and latency.
4343
- **CDC (Change Data Capture)** — enable SQL Server CDC on `ParcelSummary`, stream changes via Kafka Connect. Best for consumers that need the DB state, not the raw events.
44+
45+
---
46+
47+
## Spec Constraint: Why Polling Exists
48+
49+
The entire `ApiPollerWorker` exists solely because the spec defines a **pull-only GET endpoint**. This is the root architectural constraint, and it has a cascading effect on every design decision in this codebase.
50+
51+
### What Polling Costs
52+
53+
| Property | Polling (current) | Event-driven (ideal) |
54+
| ------------------- | ----------------------------------------------- | ----------------------------------- |
55+
| Scaling | Single-instance (shared `LastEventId` cursor) | Stateless consumers — scale to zero |
56+
| Latency | Up to `PollingIntervalSeconds` (5 s) | Sub-second |
57+
| Fan-out | Requires this worker to forward to SNS/SQS | Native via EventBridge rules |
58+
| Fault model | Long-running process; crash = gap until restart | Lambda retries per-invocation |
59+
| Infrastructure cost | Always-on ECS/EC2 task | Pay-per-invocation |
60+
| Operational burden | Cursor state in SQL, DLQ monitoring | Managed by the platform |
61+
62+
Polling is a workaround, not a design choice. If the upstream offered a push mechanism, `ApiPollerWorker`, `ProcessingState`, and `SqsMessageQueue` could all be deleted.
63+
64+
### The Unconstrained Design
65+
66+
If the scan event source could emit events (webhook, DynamoDB Streams, EventBridge, or S3 notifications), the architecture collapses into a fully serverless, event-driven pipeline:
67+
68+
```mermaid
69+
architecture-beta
70+
service source(internet)[Scan Event Source]
71+
72+
group aws(logos:aws)[AWS]
73+
74+
service eb(logos:aws-eventbridge)[EventBridge or SNS] in aws
75+
service sfn(logos:aws-stepfunctions)[Step Functions] in aws
76+
service fn(logos:aws-lambda)[Lambda durable functions] in aws
77+
service db(logos:aws-dynamodb)[DynamoDB] in aws
78+
service s3(logos:aws-s3)[S3] in aws
79+
80+
source:R --> L:eb
81+
eb:R --> L:sfn
82+
sfn:T --> T:fn
83+
eb:B --> T:fn
84+
sfn:R --> L:db
85+
sfn:B --> T:s3
86+
```
87+
88+
**Each component earns its place:**
89+
90+
- **EventBridge / SNS** — zero-config fan-out; downstream consumers subscribe without any changes to the producer
91+
- **AWS Step Functions** — replaces `EventProcessorWorker`'s manual retry/DLQ logic with durable, visual orchestration; retries, catch blocks, and compensating transactions are declared, not coded
92+
- **DynamoDB** — replaces SQL Server with a serverless, horizontally scaled store; DynamoDB Streams can trigger further Lambdas for free, enabling second-order fan-out with no extra infrastructure
93+
- **S3** — each raw event lands in an S3 object on arrival; satisfies compliance audit requirements without any schema migration
94+
- **Lambda** — each invocation is independent; a crash affects one event, not the entire feed; cold-start latency is acceptable at 5-second polling granularity anyway
95+
96+
### Why This Matters for Microservices
97+
98+
The polling model creates a centralised bottleneck: exactly one process owns the cursor, owns the queue writes, and owns the fan-out decision. In a microservices context this is an anti-pattern — every downstream team depends on this worker staying healthy.
99+
100+
An event-driven source eliminates the coupling entirely. Downstream services subscribe to EventBridge or SNS directly; the scan event producer has no knowledge of consumers, and neither does this worker.
101+
102+
The current SQS abstraction (`IMessageQueue`) was designed with this migration in mind: replacing `SqsMessageQueue` with an EventBridge publisher requires changing one DI registration in `Program.cs`.

docs/local-setup.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,22 @@
22

33
## Prerequisites
44

5-
- [.NET 10 SDK](https://dotnet.microsoft.com/download)
6-
- [Docker](https://www.docker.com/) (for SQL Server + LocalStack)
5+
- [mise](https://mise.jdx.dev/getting-started.html)
6+
- manages .NET 10, pkl, and hk
7+
- [Docker](https://www.docker.com/)
8+
- for SQL Server + LocalStack
79

810
## Steps
911

10-
### 1. Start infrastructure
12+
### 1. Install tools
13+
14+
```bash
15+
mise i
16+
```
17+
18+
This installs .NET 10, pkl, and hk as declared in `mise.toml`.
19+
20+
### 2. Start infrastructure
1121

1222
```bash
1323
docker-compose up -d
@@ -16,9 +26,9 @@ docker-compose up -d
1626
This starts:
1727

1828
- **Azure SQL Edge** on `localhost:1433` (ARM64-compatible SQL Server)
19-
- **LocalStack** on `localhost:4566` (local SQS emulator queues auto-created via `scripts/init-localstack.sh`)
29+
- **LocalStack** on `localhost:4566` (local SQS emulator - queues auto-created via `scripts/init-localstack.sh`)
2030

21-
### 2. Create the database
31+
### 3. Create the database
2232

2333
```bash
2434
docker exec -it $(docker ps -q -f ancestor=mcr.microsoft.com/azure-sql-edge) \
@@ -28,14 +38,14 @@ docker exec -it $(docker ps -q -f ancestor=mcr.microsoft.com/azure-sql-edge) \
2838

2939
The worker auto-creates the `ProcessingState` and `ParcelSummary` tables on startup via `DatabaseInitialiser`.
3040

31-
### 3. Set dummy AWS credentials (LocalStack doesn't validate them)
41+
### 4. Set dummy AWS credentials (LocalStack doesn't validate them)
3242

3343
```bash
3444
export AWS_ACCESS_KEY_ID=test
3545
export AWS_SECRET_ACCESS_KEY=test
3646
```
3747

38-
### 4. Configure the API base URL (required)
48+
### 5. Configure the API base URL (required)
3949

4050
The worker cannot connect without this. Set it via user-secrets:
4151

@@ -46,13 +56,13 @@ dotnet user-secrets set "ScanEventApi:BaseUrl" "https://your-api-host" \
4656

4757
Or edit `ScanEventApi:BaseUrl` directly in `src/ScanEventWorker/appsettings.json`.
4858

49-
### 5. Run the worker
59+
### 6. Run the worker
5060

5161
```bash
5262
dotnet run --project src/ScanEventWorker/ScanEventWorker.csproj
5363
```
5464

55-
### 6. Verify resumability
65+
### 7. Verify resumability
5666

5767
Stop the worker (`Ctrl+C`) and restart it. The log will show:
5868

0 commit comments

Comments
 (0)