diff --git a/README.md b/README.md index f669056..0b9c8f7 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ This toolkit helps teams build consistent, well-documented APIs for safety net programs—enabling faster integration between benefits systems and reducing the technical barriers to improving service delivery. -**New here?** Start with the [Executive Summary](./docs/presentation/executive-summary.md) for a one-page overview, or view the [Toolkit Overview](https://codeforamerica.github.io/safety-net-openapi/presentation/safety-net-openapi-overview.html) presentation for a detailed walkthrough. +**New here?** Start with the [Executive Summary](./docs/presentation/executive-summary.md) for a one-page overview, or view the [Toolkit Overview](https://codeforamerica.github.io/safety-net-openapi/presentation/safety-net-openapi-overview.html) presentation for a detailed walkthrough. For the technical design, see the [Domain Architecture](./docs/architecture/domain-design.md). ## About This Repository @@ -67,6 +67,9 @@ Visit `http://localhost:3000` for interactive API docs. - [Project Structure](./docs/reference/project-structure.md) — File layout and conventions - [Troubleshooting](./docs/reference/troubleshooting.md) — Common issues and solutions +### Architecture +- [Architecture Overview](./docs/architecture/README.md) — Domain design, API architecture, design decisions, and roadmap + ### Architecture Decisions - [Multi-State Overlays](./docs/architecture-decisions/multi-state-overlays.md) - [Search Patterns](./docs/architecture-decisions/search-patterns.md) diff --git a/docs/architecture/README.md b/docs/architecture/README.md new file mode 100644 index 0000000..cda0889 --- /dev/null +++ b/docs/architecture/README.md @@ -0,0 +1,23 @@ +# Architecture Documentation + +This directory contains architecture documentation for the Safety Net Benefits API. + +> **Status: Proposed Architecture** +> These documents describe the target architecture. The current implementation includes only a subset (Intake with Applications and Persons). The remaining domains and functionality are planned for future development. + +## Documents + +| Document | Description | +|----------|-------------| +| [Domain Design](domain-design.md) | Domain organization, entities, data flow, and safety net concerns | +| [API Architecture](api-architecture.md) | API layers, vendor independence, and operational architecture | +| [Design Decisions](design-decisions.md) | Key decisions with rationale and alternatives considered | +| [Roadmap](roadmap.md) | Migration, implementation phases, future considerations, and documentation gaps | + +## Related Resources + +| Resource | Description | +|----------|-------------| +| [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml) | Machine-readable API design patterns | +| [Architecture Decision Records](../architecture-decisions/) | Formal ADRs for significant decisions | +| [Guides](../guides/) | How-to guides for working with the toolkit | diff --git a/docs/architecture/api-architecture.md b/docs/architecture/api-architecture.md new file mode 100644 index 0000000..876b94c --- /dev/null +++ b/docs/architecture/api-architecture.md @@ -0,0 +1,393 @@ +# API Architecture + +How the Safety Net Benefits API is organized, layered, and operated. + +See also: [Domain Design](domain-design.md) | [Design Decisions](design-decisions.md) | [Roadmap](roadmap.md) + +--- + +## 1. API Layer Organization + +Following API-led connectivity principles, APIs are organized into layers. + +### System APIs vs Process APIs + +| Layer | Purpose | Style | Example | +|-------|---------|-------|---------| +| **System APIs** | Direct access to domain data | RESTful CRUD | `GET /tasks/{id}`, `POST /applications` | +| **Process APIs** | Orchestrate business operations | RPC-style actions | `POST /processes/workflow/tasks/claim` | + +**Key distinctions:** +- **System APIs** own canonical schemas and provide granular CRUD operations on resources +- **Process APIs** call multiple System APIs to perform business operations; they don't access data directly +- Process APIs define purpose-built request/response DTOs optimized for specific use cases + +### Folder Structure + +``` +openapi/ +├── domains/ # System APIs (resource-based) +│ ├── workflow/ +│ │ ├── tasks.yaml +│ │ ├── queues.yaml +│ │ └── components/schemas.yaml # Canonical schemas +│ ├── case-management/ +│ ├── intake/ +│ └── eligibility/ +│ +├── processes/ # Process APIs (domain/resource/action) +│ ├── workflow/ +│ │ ├── tasks/ +│ │ │ ├── claim.yaml +│ │ │ ├── complete.yaml +│ │ │ └── reassign.yaml +│ │ └── verification/ +│ │ ├── start.yaml +│ │ └── complete.yaml +│ ├── case-management/ +│ │ ├── workers/ +│ │ │ └── assign.yaml +│ │ └── cases/ +│ │ └── transfer.yaml +│ ├── communication/ +│ │ └── notices/ +│ │ └── send.yaml +│ └── components/schemas.yaml # Process-specific DTOs +│ +└── components/ # Shared primitives (Address, Name, etc.) +``` + +### Process API Organization + +Process APIs are organized **by domain, then resource, then action**: + +``` +/processes/{domain}/{resource}/{action} +``` + +**Examples:** +``` +/processes/workflow/tasks/claim +/processes/workflow/tasks/complete +/processes/workflow/verification/start +/processes/case-management/workers/assign +/processes/case-management/cases/transfer +/processes/communication/notices/send +``` + +**Convention:** When an operation involves multiple resources, place it under **the resource being acted upon** (not the primary output). This matches natural language and improves discoverability: + +| Operation | Resource acted upon | Path | +|-----------|---------------------|------| +| Claim a task | Task | `/processes/workflow/tasks/claim` | +| Assign a worker | Worker | `/processes/case-management/workers/assign` | +| Transfer a case | Case | `/processes/case-management/cases/transfer` | +| Send a notice | Notice | `/processes/communication/notices/send` | + +**Metadata:** Each operation includes actor and capability metadata: + +```yaml +# processes/workflow/tasks/claim.yaml +post: + x-actors: [caseworker] # Who can call this + x-capability: task-management # Business capability +``` + +### What This Repo Provides + +| Asset | Purpose | State Usage | +|-------|---------|-------------| +| Domain architecture | Patterns, entity relationships, terminology | Adopt/adapt | +| System API specs | Base schemas + overlay support | Extend via overlays | +| Process API contracts | Interface definitions (inputs/outputs) | Implement against | +| Mock System APIs | Testing tool | Use for development/testing | +| Reference implementations | Educational examples (TypeScript) | Learn from, don't extend | + +**Important:** Reference implementations are examples, not production code to extend. States implement Process APIs from the contracts in their preferred language. + +### Mock Server Scope + +The mock server provides dynamic responses for **System APIs only**: +- Reads OpenAPI specs and examples +- Maintains mock database state +- Supports CRUD operations + +Process APIs are **not mocked** because: +- Process APIs are orchestration logic—that's what you want to test +- Real Process API implementations call mock System APIs during development +- States implement Process APIs; this repo provides the contracts + +--- + +## 2. Vendor Independence + +This architecture helps states avoid vendor lock-in when procuring backend systems (workflow management, case management, etc.). + +### Adapter Pattern + +``` +┌─────────────────────────────────────────────────────────┐ +│ Process APIs (state's business logic) │ +│ - Implements eligibility, orchestration │ +│ - Calls System API contracts, NOT vendor APIs │ +└───────────────────────┬─────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ System API Contracts (this repo's OpenAPI specs) │ +│ - Canonical domain model (Task, Case, Application) │ +│ - Vendor-agnostic interface │ +└───────────────────────┬─────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ Adapter Layer (thin, replaceable) │ +│ - Maps vendor data ↔ canonical model │ +│ - Implements System API contract │ +└───────────────────────┬─────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ Vendor System (workflow tool, case mgmt, etc.) │ +└─────────────────────────────────────────────────────────┘ +``` + +### Impact of Switching Vendors + +| Layer | Impact | +|-------|--------| +| Process APIs | No change | +| System API Contracts | No change | +| Adapter Layer | Rewrite for new vendor | +| Vendor System | Replace | + +### Guidance for States + +1. **Never call vendor APIs directly from Process APIs** - always go through the System API layer +2. **Keep adapters thin** - translation only, no business logic +3. **Domain model is source of truth** - vendor models map to yours, not vice versa +4. **Test against mocks** - proves your code isn't secretly coupled to a vendor + +--- + +## 3. Operational Architecture + +### Configuration Management + +**Principle:** If a policy analyst needs to change it, it's configuration. If a developer needs to change it, it's code. + +**Business-configurable settings:** + +| Configurable | Example | Changed By | +|--------------|---------|------------| +| Workflow rules | Assignment rules, priority rules | Program managers | +| Eligibility thresholds | Income limits, asset limits, FPL percentages | Policy analysts | +| SLA timelines | Days to process, warning thresholds | Operations managers | +| Notice templates | Content, formatting | Communications staff | +| Feature flags | Enable/disable programs, pilots | Product owners | +| Business calendars | Holidays, office hours | Office managers | + +**Architecture:** + +``` +┌─────────────────────────────────────────────┐ +│ Admin UI (for business users) │ +└───────────────────────┬─────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────┐ +│ Admin/Config APIs │ +│ - GET/PUT /config/eligibility-thresholds │ +│ - GET/PUT /config/workflow-rules │ +│ - GET/PUT /config/sla-settings │ +└───────────────────────┬─────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────┐ +│ Config Store (versioned, audited) │ +└─────────────────────────────────────────────┘ +``` + +**Key requirements:** +- All configuration changes are audited (who changed what, when) +- Configuration is versioned (rollback capability) +- Changes take effect without deployment +- Validation prevents invalid configurations + +### Observability + +**For operations staff to monitor and support the APIs:** + +| Capability | Purpose | Standard | +|------------|---------|----------| +| Health endpoints | Is the system up? | `GET /health`, `GET /ready` | +| Metrics | Request rates, latencies, error rates | Prometheus format | +| Structured logging | Searchable, consistent format | JSON with correlation IDs | +| Distributed tracing | Follow requests across APIs | OpenTelemetry | +| Audit logs | Who did what when | Domain events (e.g., TaskAuditEvent) | +| Alerting hooks | Integration with incident management | Webhooks, PagerDuty, etc. | + +**Standard endpoints for all APIs:** + +```yaml +/health: + get: + summary: Liveness check + responses: + 200: { description: Service is running } + +/ready: + get: + summary: Readiness check (dependencies healthy) + responses: + 200: { description: Service is ready to accept traffic } + 503: { description: Service is not ready } + +/metrics: + get: + summary: Prometheus metrics + responses: + 200: { description: Metrics in Prometheus format } +``` + +**Logging standards:** +- All logs include correlation ID for request tracing +- Structured JSON format for searchability +- Standard fields: timestamp, level, service, correlationId, message +- PII is masked or excluded from logs + +**Key SLI Metrics (API-level):** + +| Metric | Description | Target | +|--------|-------------|--------| +| `process_api_latency_seconds` | Process API response time (p50, p95, p99) | p95 < 500ms | +| `system_api_latency_seconds` | System API response time (p50, p95, p99) | p95 < 200ms | +| `error_rate` | API error rate by endpoint (4xx, 5xx) | < 1% | +| `availability` | Service uptime percentage | 99.9% | + +Domain-specific metrics (task completion time, SLA breach rate, etc.) are documented in domain files. See [Workflow Operational Metrics](domains/workflow.md#operational-metrics). + +### Performance + +**Caching:** + +| Data Type | TTL | Rationale | +|-----------|-----|-----------| +| TaskType (config) | 5 minutes | Rarely changes, high read volume | +| SLAType (config) | 5 minutes | Rarely changes, used in every task | +| WorkflowRule (config) | 1 minute | May be updated by admins | +| Queue (config) | 1 minute | May be updated by admins | +| User session/permissions | 5 minutes | Balance security vs performance | + +**Pagination:** +- Default limit: 25 items +- Maximum limit: 100 items +- Clients requesting more than 100 receive 100 + +**Query Complexity:** +- JSON Logic rules (WorkflowRule.conditions) are limited to: + - Maximum depth: 5 levels of nesting + - Maximum operations: 20 logical operators per rule + - Evaluation timeout: 100ms +- Search queries (`q` parameter) are limited to 10 filter conditions + +### Reliability + +**Idempotency:** +All state-changing operations support idempotent retries. See [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#idempotency) for implementation details. + +**Circuit Breakers:** +Circuit breakers protect the system when external dependencies fail. See [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#circuit-breakers) for configuration. + +Key circuit breaker locations: +- External verification sources (IRS, SSA, state databases) +- Vendor system adapters (workflow tools, case management systems) +- Notice delivery services (email, SMS, postal) + +### Security + +**Data Classification:** + +All API fields are classified for appropriate handling. See [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#data-classification) for the full taxonomy. + +| Classification | Description | Example Fields | +|----------------|-------------|----------------| +| `pii` | Personally Identifiable Information | SSN, DOB, name, address | +| `sensitive` | Sensitive but not PII | income, case notes, medical info | +| `internal` | Internal operational data | assignedToId, queueId, timestamps | +| `public` | Non-sensitive reference data | programType, taskTypeCode, status | + +**PII Handling:** +- PII is encrypted at rest +- PII is masked in logs (last 4 digits only for SSN) +- PII access is logged for audit +- PII fields are excluded from search indexes + +### Compliance + +**Data Retention:** +See [Roadmap - Data Retention](roadmap.md#needs-architecture-documentation) for retention periods by data type. + +**Right to Deletion:** +- Deletion requests must balance client rights against audit requirements +- Application data may be anonymized rather than deleted if audit trail is required +- States must document their deletion process per program requirements + +**Regulatory References:** +- SNAP: 7 CFR 272.1 (record retention) +- Medicaid: 42 CFR 431.17 (records and reports) +- TANF: 45 CFR 265.2 (data collection) +- HIPAA: Applies to Medicaid-related health information +- FERPA: May apply when education data is used for eligibility + +*Note:* Detailed compliance mapping is state-specific. States should map these requirements to their specific field-level handling. + +--- + +## 4. Quality Attributes Summary + +This section provides a central index of architectural quality attributes (-ilities) and where each is documented. + +| Quality Attribute | Status | Documentation Location | +|-------------------|--------|------------------------| +| **Reliability** | | | +| Idempotency | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#idempotency) | +| Circuit breakers | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#circuit-breakers) | +| Error handling | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#error-handling) | +| Long-running operations | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#long-running-operations) | +| **Security** | | | +| Authentication | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#authentication) | +| Authorization (RBAC/ABAC) | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#authorization) | +| Data classification | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#data-classification) | +| Security headers | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#security-headers) | +| Audit logging | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#audit-logging) | +| **Performance** | | | +| Caching | Addressed | [api-architecture.md](#performance) (this file) | +| Pagination | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#list-endpoints) | +| Rate limiting | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#rate-limiting) | +| Query complexity limits | Addressed | [api-architecture.md](#performance) (this file) | +| **Observability** | | | +| Health endpoints | Addressed | [api-architecture.md](#observability) (this file) | +| Metrics (API-level) | Addressed | [api-architecture.md](#observability) (this file) | +| Metrics (domain-specific) | Addressed | [workflow.md](domains/workflow.md#operational-metrics) | +| Correlation IDs | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#correlation-ids) | +| Distributed tracing | Addressed | [api-architecture.md](#observability) (this file) | +| **Compliance** | | | +| Data retention | Partially addressed | [roadmap.md](roadmap.md#needs-architecture-documentation) | +| Right to deletion | Addressed | [api-architecture.md](#compliance) (this file) | +| Regulatory references | Addressed | [api-architecture.md](#compliance) (this file) | +| **Interoperability** | | | +| API versioning | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#versioning) | +| ETags/concurrency | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#etags) | +| Vendor independence | Addressed | [api-architecture.md](#2-vendor-independence) (this file) | +| **Maintainability** | | | +| Configuration management | Addressed | [api-architecture.md](#configuration-management) (this file) | +| Schema patterns | Addressed | [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#schema-patterns) | + +--- + +## Related Resources + +- [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml) - Detailed API design patterns including security, error handling, versioning +- [Domain Design](domain-design.md) - Domain organization and entity relationships +- [Design Decisions](design-decisions.md) - Rationale for architectural choices diff --git a/docs/architecture/cross-cutting/communication.md b/docs/architecture/cross-cutting/communication.md new file mode 100644 index 0000000..14a29e1 --- /dev/null +++ b/docs/architecture/cross-cutting/communication.md @@ -0,0 +1,313 @@ +# Communication (Cross-Cutting) + +Detailed schemas for the Communication cross-cutting concern. See [Domain Design Overview](../domain-design.md) for context. + +## Overview + +Communication is cross-cutting because notices and correspondence can originate from any domain: +- **Intake**: "Application received" +- **Eligibility**: "Approved", "Denied", "Request for information" +- **Workflow**: "Documents needed", "Interview scheduled" +- **Case Management**: "Case worker assigned" + +| Entity | Purpose | +|--------|---------| +| **Notice** | Official communication (approval, denial, RFI, etc.) | +| **Correspondence** | Other communications | +| **DeliveryRecord** | Tracking of delivery status | + +--- + +## Capabilities + +| Capability | Supported By | +|------------|--------------| +| **Caseworker** | | +| Send notice to client | `POST /processes/communication/notices/send` | +| Retry failed delivery | `POST /processes/communication/notices/retry` | +| Track notice delivery | `GET /notices`, `GET /delivery-records` (System APIs) | +| **Supervisor** | | +| Approve notice before sending | `POST /processes/communication/notices/approve` | +| Monitor failed deliveries | `GET /delivery-records?q=status:failed` (System API) | +| **System/Automation** | | +| Auto-send notices on events | `POST /processes/communication/notices/send` (skipReview: true) | +| Retry failed deliveries | `POST /processes/communication/notices/retry` | +| **Client** | | +| View notices in portal | `GET /notices` (System API, client-scoped) | + +**Notes:** +- Notices are triggered by events in other domains (Intake, Eligibility, Workflow, Case Management). +- Notice content comes from templates (see [Configuration Management](../api-architecture.md#configuration-management)). +- Delivery tracking supports multiple channels (postal, email, portal). + +--- + +## Schemas + +### Notice + +Official communications sent to clients. + +```yaml +Notice: + properties: + id: uuid + noticeType: + # Determination + - approval + - denial + - partial_approval + # Information requests + - request_for_information + - interview_scheduled + - interview_missed + # Status + - application_received + - pending_verification + - under_review + # Action + - renewal_due + - recertification_required + - benefits_ending + - benefits_change + # Appeals + - appeal_received + - hearing_scheduled + - appeal_decision + applicationId: uuid + programType: enum + recipientInfo: NoticeRecipientInfo + deliveryMethod: + - postal_mail + - email + - both + - in_person + status: + - draft + - pending_review + - approved + - sent + - delivered + - returned + - failed + language: Language + responseRequired: boolean + responseDueDate: datetime + responseReceivedDate: datetime + denialReasons: DenialReason[] # For denial notices + rfiItems: RequestForInformationItem[] # For RFI notices + generatedByTaskId: uuid # Task that triggered this notice + sentAt: datetime + sentById: uuid + createdAt, updatedAt: datetime +``` + +### NoticeRecipientInfo + +Recipient details for a notice. + +```yaml +NoticeRecipientInfo: + properties: + clientId: uuid + name: Name + address: Address # Mailing address + email: Email # For electronic delivery + preferredLanguage: Language + accommodations: string[] # Accessibility needs +``` + +### DenialReason + +Reason for benefit denial. + +```yaml +DenialReason: + properties: + code: string # Standard denial code + description: string # Human-readable explanation + regulation: string # Regulatory citation + appealable: boolean +``` + +### RequestForInformationItem + +Item requested in an RFI notice. + +```yaml +RequestForInformationItem: + properties: + itemType: string # "income_verification", "identity_document", etc. + description: string # What is needed + dueDate: datetime # When it's due + receivedDate: datetime # When received (if any) + status: + - pending + - received + - waived + - expired +``` + +### Correspondence + +Other communications (non-official notices). + +```yaml +Correspondence: + properties: + id: uuid + correspondenceType: + - client_inquiry + - worker_note + - inter_agency + - third_party + direction: + - inbound # From client/external + - outbound # To client/external + applicationId: uuid + caseId: uuid + clientId: uuid + subject: string + body: string + attachmentIds: uuid[] # Document references + sentById: uuid # Worker who sent (if outbound) + receivedAt: datetime + createdAt, updatedAt: datetime +``` + +### DeliveryRecord + +Tracking of notice/correspondence delivery. + +```yaml +DeliveryRecord: + properties: + id: uuid + noticeId: uuid # Or correspondenceId + deliveryMethod: + - postal_mail + - email + - in_person + - portal + status: + - pending + - sent + - delivered + - bounced + - returned + - failed + trackingNumber: string # For postal mail + sentAt: datetime + deliveredAt: datetime + failureReason: string + retryCount: integer + createdAt, updatedAt: datetime +``` + +--- + +## Process APIs + +Process APIs orchestrate business operations by calling System APIs. They follow the pattern `POST /processes/{domain}/{resource}/{action}` and use `x-actors` and `x-capability` metadata. + +See [API Architecture](../api-architecture.md) for the full Process API pattern. + +### Notice Operations + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/communication/notices/send` | caseworker, system | Generate and send a notice | +| `POST /processes/communication/notices/approve` | supervisor | Approve a pending notice | +| `POST /processes/communication/notices/retry` | caseworker, system | Retry a failed delivery | + +--- + +### Send Notice + +Generate and send a notice to a client. + +```yaml +POST /processes/communication/notices/send +x-actors: [caseworker, system] +x-capability: communication + +requestBody: + noticeType: string # approval, denial, request_for_information, etc. + applicationId: uuid # Related application + clientId: uuid # Recipient + programType: string # snap, medicaid, tanf + templateId: uuid # Notice template to use (optional) + templateData: object # Data to populate template + deliveryMethod: + - postal_mail + - email + - both + skipReview: boolean # Auto-approve (for system-generated) + +responses: + 200: + notice: Notice # Created notice + deliveryRecord: DeliveryRecord + +# Orchestrates: +# 1. Load notice template based on noticeType and programType +# 2. Populate template with client data and templateData +# 3. Create Notice record +# 4. If skipReview or system actor, set status: approved +# 5. If requires review, set status: pending_review +# 6. If approved, create DeliveryRecord and initiate delivery +# 7. If triggered by task, link via generatedByTaskId +``` + +### Approve Notice + +Supervisor approves a notice pending review. + +```yaml +POST /processes/communication/notices/approve +x-actors: [supervisor] +x-capability: communication + +requestBody: + noticeId: uuid # Notice to approve + modifications: object # Optional edits to notice content + notes: string # Approval notes + +responses: + 200: + notice: Notice # Approved notice + deliveryRecord: DeliveryRecord + +# Orchestrates: +# 1. Validate notice is in pending_review status +# 2. Apply any modifications +# 3. Update Notice.status → approved +# 4. Create DeliveryRecord and initiate delivery +``` + +### Retry Delivery + +Retry a failed notice delivery. + +```yaml +POST /processes/communication/notices/retry +x-actors: [caseworker, system] +x-capability: communication + +requestBody: + noticeId: uuid # Notice to retry + deliveryMethod: string # Retry with same or different method + updatedAddress: Address # If address was incorrect (optional) + updatedEmail: Email # If email was incorrect (optional) + +responses: + 200: + deliveryRecord: DeliveryRecord # New delivery attempt + +# Orchestrates: +# 1. Validate previous delivery failed +# 2. Update recipient info if provided +# 3. Increment DeliveryRecord.retryCount +# 4. Create new DeliveryRecord with status: pending +# 5. Initiate delivery +``` diff --git a/docs/architecture/design-decisions.md b/docs/architecture/design-decisions.md new file mode 100644 index 0000000..cb6d974 --- /dev/null +++ b/docs/architecture/design-decisions.md @@ -0,0 +1,304 @@ +# Design Decisions + +Key decisions made during design, with alternatives considered. These are **proposed decisions** - review and adjust before implementation. + +See also: [Domain Design](domain-design.md) | [API Architecture](api-architecture.md) | [Roadmap](roadmap.md) + +> **How to use this log**: Each decision includes the options we considered and why we chose one over others. If circumstances change or new information emerges, revisit the rationale to determine if a different choice makes more sense. + +--- + +## Decision Log + +### Where does Application live? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Intake | Application captures what the client reports | Yes | +| Eligibility | Application is fundamentally about determining eligibility | No | +| Case Management | Application is one event in a larger case lifecycle | No | + +*Rationale*: Application is the client's perspective - what they told us. Eligibility interprets that data per program rules. Case Management tracks the ongoing relationship across multiple applications. + +*Reconsider if*: Applications become tightly coupled to eligibility rules rather than being a neutral record of client-reported data. + +--- + +### How to handle living arrangements and eligibility groupings? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Single "Household" entity | Simple, but conflates factual and regulatory concepts | No | +| Snapshots only | Each application captures composition at that moment | Partially | +| Split: LivingArrangement + EligibilityUnit | Factual data persists; programs interpret into eligibility units | Yes | + +*Rationale*: "Household" is a regulatory term with different meanings per program (IRS, SNAP, Medicaid). We use `LivingArrangement` for the factual "who do you live with" data (in Client Management and Intake), and `EligibilityUnit` for program-specific groupings (in Eligibility). Regulatory terms like "household" or "tax unit" appear in descriptions. + +*Reconsider if*: Living arrangement changes are infrequent and the complexity of tracking both isn't justified, or if all programs use the same grouping rules. + +--- + +### Is Income its own domain? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Own domain | Complex enough to warrant separation | No | +| Part of Eligibility | Only useful for eligibility | No | +| Split: Income (Intake) + verified income (Eligibility) | Matches reported vs interpreted pattern | Yes | + +*Rationale*: Follows the same pattern as household - what client reports vs how programs interpret it. + +*Reconsider if*: Income tracking becomes significantly more complex (e.g., real-time income verification, multiple income sources with independent lifecycles) and warrants dedicated APIs. + +--- + +### Case Management vs Workflow: one or two domains? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Combined | Simpler, fewer domains | No | +| Separate | Clear separation of concerns | Yes | + +*Rationale*: They answer different questions. Workflow = "What needs to be done?" Case Management = "Who's responsible for this relationship?" + +*Reconsider if*: The separation creates too much complexity in practice, or if case workers primarily interact with the system through tasks (making them effectively the same). + +--- + +### Where does Verification live? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Own domain | Verification is complex | No | +| Part of Workflow | Verification is work that needs to be done | Yes | +| Part of Case Management | Case workers do verification | No | + +*Rationale*: Verification tasks are work items with SLAs and outcomes - fits naturally with Workflow. + +*Reconsider if*: Verification becomes a complex subsystem with its own rules engine, third-party integrations, and document processing pipelines. + +--- + +### Is Reporting its own domain? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Own domain | Could hold report definitions, metrics | No | +| Cross-cutting concern | Aggregates data from all domains | Yes | + +*Rationale*: Reporting doesn't own entities - it consumes data from other domains. Audit events live where actions happen. + +*Reconsider if*: Federal reporting requirements become complex enough to warrant standardized report definitions, scheduling, and delivery tracking as first-class entities. + +--- + +### Terminology: what to call people receiving benefits? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Person | Generic | No | +| Client | Common in social work | Yes | +| Participant | Common in federal programs | No | +| Beneficiary | Implies already receiving benefits | No | + +*Rationale*: "Client" is widely used in social services and clearly indicates someone the agency serves. + +*Reconsider if*: Integrating with systems that use different terminology (e.g., "participant" in federal systems) and alignment is important. + +--- + +### What financial data belongs in Client Management vs Intake? + +| Option | Considered | Chosen | +|--------|------------|--------| +| All in Intake | Simpler, fresh data each application | No | +| All in Client Management | Maximum pre-population | No | +| Split by stability | Stable income persists; point-in-time data in Intake | Yes | + +**Persist in Client Management:** +- Income (SSI, SSDI, pensions, retirement, child support) - verified once, rarely changes +- Employer - useful for pre-population + +**Keep in Intake (point-in-time):** +- Income (current wages/earnings) +- Resource (vehicles, property, bank balances) +- Expense (rent, utilities) + +*Rationale*: Only persist data that (1) is verified once and rarely changes, (2) provides real value for pre-populating future applications, and (3) is useful for case workers to see across applications. Assets and expenses are only used for point-in-time eligibility determination - there's no value in persisting them beyond the application. + +*Reconsider if*: There's a need to track asset/expense changes over time for fraud detection, or if pre-populating assets significantly reduces client burden and error rates. + +--- + +### Should entities have distinct names across domains? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Distinct names per domain | Self-documenting, explicit | No | +| Same name, domain provides context | Simpler, less cognitive load | Yes | + +*Rationale*: If entities are organized under domains with distinct API paths, the domain context already provides disambiguation. Using the same name (`Income`) in both Client Management and Intake is simpler and more natural. The path tells you the difference: `/clients/{id}/income` vs `/applications/{id}/income`. + +*Reconsider if*: Developers frequently work across domains and find the shared naming confusing, or if schemas need to be referenced in a shared context where domain isn't clear. + +--- + +### Explicit Tasks vs Workflow Engine? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Explicit Task entities | Simple, flexible, follows existing patterns | Yes | +| BPMN workflow engine | Declarative, visual modeling | No | + +*Rationale*: Explicit tasks are simpler and sufficient for v1. A workflow engine can be layered on top later if needed. + +*Reconsider if*: Workflows become complex enough that declarative definitions and visual modeling would significantly reduce implementation effort, or if non-developers need to modify workflows. + +--- + +### JSON Logic for Workflow Rule Conditions + +| Option | Considered | Chosen | +|--------|------------|--------| +| Hardcoded condition fields | Simple but requires schema changes for new conditions | No | +| Custom expression language | Maximum flexibility but proprietary | No | +| JSON Logic | Standard, portable, well-documented | Yes | +| MongoDB-style queries | Familiar syntax, but less expressive | No | +| OPA/Rego | Powerful but heavy, separate runtime | No | + +*Rationale*: [JSON Logic](https://jsonlogic.com/) is an open standard for expressing conditions as JSON objects. It has implementations in JavaScript, Python, Java, Go, and other languages. Workflow rules can define arbitrary conditions without schema changes—new condition types are added by exposing new context variables, not by changing the schema. + +*Example*: +```json +{ + "and": [ + { "==": [{ "var": "task.programType" }, "snap"] }, + { "<": [{ "var": "application.household.youngestChildAge" }, 6] } + ] +} +``` + +*Reconsider if*: Conditions become complex enough to require a full rules engine (e.g., Drools), or if business users need a visual rule builder (which would generate JSON Logic underneath). + +--- + +### Configurable vs Hardcoded Task and SLA Types + +| Option | Considered | Chosen | +|--------|------------|--------| +| Hardcoded enums | Simple but inflexible; schema changes for new types | No | +| Configuration entities | TaskType and SLAType as lookup tables with `code` as PK | Yes | +| UUIDs for configuration | Standard approach but less user-friendly | No | + +*Rationale*: Task types and SLA types are configuration data that changes as programs evolve. Using `code` as the primary key (e.g., `verify_income`, `snap_expedited`) is more readable and user-friendly than UUIDs. New task types can be added without schema changes. + +*Reconsider if*: Configuration data needs to be synchronized across systems where code collisions are possible (UUIDs would guarantee uniqueness). + +--- + +### System APIs vs Process APIs? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Single API layer | Simpler, fewer moving parts | No | +| Two layers (System + Process) | Clear separation of data access vs orchestration | Yes | + +*Rationale*: System APIs provide RESTful CRUD access to domain data. Process APIs orchestrate business operations by calling System APIs. This separation means Process APIs contain business logic while System APIs remain simple and reusable. + +*Reconsider if*: The overhead of maintaining two layers isn't justified by the complexity of the business processes, or if most operations map 1:1 to CRUD actions. + +--- + +### What should the mock server cover? + +| Option | Considered | Chosen | +|--------|------------|--------| +| All APIs | Complete testing environment | No | +| System APIs only | Mock data layer, test real orchestration | Yes | + +*Rationale*: Process APIs are orchestration logic—that's what you want to test. Mocking them defeats the purpose. Real Process API implementations call mock System APIs during development. + +*Reconsider if*: Teams need to develop against Process APIs before implementations exist, or if Process API behavior is complex enough to warrant contract testing via mocks. + +--- + +### How to organize Process APIs? + +| Option | Considered | Chosen | +|--------|------------|--------| +| By actor (client/, caseworker/, admin/) | Intuitive grouping by who uses it | No | +| By capability (applications/, eligibility/, tasks/) | Actor-agnostic, same operation available to multiple actors | Partially | +| By domain, then resource, then action | Clear hierarchy, matches domain structure | Yes | + +*Rationale*: Many operations are used by multiple actors (e.g., both clients and caseworkers can submit applications). Actor metadata (`x-actors: [client, caseworker]`) handles authorization without duplicating endpoints. Organizing by domain provides clear ownership and aligns with the System API structure. + +*Path pattern*: `/processes/{domain}/{resource}/{action}` + +*Examples*: +- `/processes/workflow/tasks/claim` +- `/processes/case-management/workers/assign` +- `/processes/communication/notices/send` + +*Convention*: When an operation involves multiple resources, place it under the resource being acted upon (not the primary output). This matches natural language and improves discoverability. + +See [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml) for the `x-actors` and `x-capability` extension definitions. + +*Reconsider if*: Actor-specific behavior diverges significantly (different request/response shapes), making shared endpoints awkward. + +--- + +### What is the purpose of reference implementations? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Production-ready code to extend | States fork and customize | No | +| Educational examples | States learn patterns, implement from scratch | Yes | + +*Rationale*: Reference implementations demonstrate how to implement Process APIs against System API contracts. States implement in their preferred language/framework. Extending reference code creates maintenance burden and hidden coupling. + +*Reconsider if*: Implementation patterns are complex enough that reference code provides significant value, or if a common framework emerges across states. + +--- + +### How to achieve vendor independence? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Standardize on specific vendors | Simpler, less abstraction | No | +| Adapter pattern | Thin translation layer between contracts and vendors | Yes | + +*Rationale*: Process APIs call System API contracts, not vendor APIs directly. Adapters translate between canonical models and vendor-specific implementations. Switching vendors means rewriting adapters, not business logic. + +*Reconsider if*: Vendor capabilities diverge so significantly that adapters become complex business logic themselves. + +--- + +### What's configurable vs code? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Everything in code | Simpler deployment, version controlled | No | +| Split by who changes it | Policy analyst changes = config; developer changes = code | Yes | + +*Rationale*: Workflow rules, eligibility thresholds, SLA timelines, and notice templates change frequently and shouldn't require deployments. Business users can adjust these through Admin APIs. Configuration is versioned and audited. + +*Reconsider if*: Configuration complexity grows to the point where it's effectively code, or if audit/versioning requirements are better served by version control. + +--- + +### Should there be an Experience Layer? + +| Option | Considered | Chosen | +|--------|------------|--------| +| Experience Layer now | Tailored APIs for each client type (mobile, web, caseworker portal) | No | +| Process APIs serve all clients | Clients call Process APIs directly | Yes | +| GraphQL in the future | Flexible querying when client needs diverge | Deferred | + +*What is an Experience Layer?* An Experience Layer (sometimes called "Backend for Frontend" or BFF) is an API layer that sits above Process APIs and tailors responses for specific client applications. For example, a mobile app might need a lightweight response with only essential fields, while a caseworker dashboard might need aggregated data from multiple domains in a single call. The Experience Layer handles this translation so Process APIs remain client-agnostic. + +*Rationale*: An Experience Layer adds complexity that isn't justified yet. Process APIs are sufficient for current use cases. Adding this layer now would mean maintaining three API layers before we understand the actual client requirements. + +*Reconsider if*: Client applications need significantly different data shapes (e.g., mobile app needs minimal payloads, web dashboard needs aggregated views), or if multiple teams are building frontends with duplicated data-fetching logic. + +*Future direction*: When an Experience Layer becomes necessary, GraphQL is likely the best choice. It allows clients to request exactly the fields they need, reducing over-fetching and enabling frontend teams to evolve independently. A GraphQL gateway could sit above Process APIs without changing the underlying architecture. + diff --git a/docs/architecture/domain-design.md b/docs/architecture/domain-design.md new file mode 100644 index 0000000..ca6c4f1 --- /dev/null +++ b/docs/architecture/domain-design.md @@ -0,0 +1,311 @@ +# Domain Design + +Domain organization, entities, data flow, and safety net specific concerns for the Safety Net Benefits API. + +See also: [API Architecture](api-architecture.md) | [Design Decisions](design-decisions.md) | [Roadmap](roadmap.md) + +> **This is design documentation, not implementation documentation.** +> +> This document describes the *proposed* domain organization - the target architecture we are designing toward. The design itself is still evolving based on research and feedback. +> +> | Domain | Design Status | Implementation Status | +> |--------|---------------|----------------------| +> | Workflow | Ready for review | Not started | +> | Intake | Work in progress | Partial (Applications, Persons, Households, Incomes) | +> | Case Management | Work in progress | Not started | +> | Eligibility | Work in progress | Not started | +> | Client Management | Work in progress | Not started | +> | Communication | Work in progress | Not started | +> | Scheduling | Work in progress | Not started | +> | Document Management | Work in progress | Not started | + +--- + +## 1. Domain Organization + +### Overview + +The Safety Net Benefits API is organized into 7 domains, with 4 cross-cutting concerns: + +| Domain | Purpose | +|--------|---------| +| **Client Management** | Persistent identity and relationships for people receiving benefits | +| **Intake** | Application submission from the client's perspective | +| **Eligibility** | Program-specific interpretation and determination | +| **Case Management** | Ongoing client relationships and staff assignments | +| **Workflow** | Work items, tasks, SLAs, and verification | +| **Scheduling** | Appointments and interviews | +| **Document Management** | Files and uploads | + +**Cross-cutting concerns:** +- **Communication** - Notices and correspondence can originate from any domain (application received, documents needed, eligibility determined, appointment scheduled, etc.) +- **Reporting** - Each domain exposes data that reporting systems consume; audit events live where actions happen +- **Configuration Management** - Business-configurable rules, thresholds, and settings that can be changed without code deployments +- **Observability** - Health checks, metrics, logging, and tracing for operations staff + +### Domain Details + +#### Client Management + +Persistent information about people applying for or receiving benefits. + +| Entity | Purpose | +|--------|---------| +| **Client** | Persistent identity - name, DOB, SSN, demographics (things that don't change often) | +| **Relationship** | Connections between clients - spouse, parent/child, sibling, etc. | +| **LivingArrangement** | Who the client reports living with (versioned over time) | +| **ContactInfo** | Addresses, phone numbers, email (may change but persists across applications) | +| **Income** | Stable income sources (SSI, SSDI, pensions, retirement, child support) - verified once, rarely changes | +| **Employer** | Past/current employers (optional, for pre-population) | + +**Key decisions:** +- "Client" = people applying for or receiving benefits +- People mentioned on applications but not applying (absent parents, sponsors) are NOT persisted as Clients - they exist only in Intake +- Relationships are stored from the client's perspective +- Only persist financial data that is stable and provides pre-population value (stable income sources, employer history) +- Do NOT persist point-in-time eligibility data (vehicles, property, bank balances, expenses) - these belong in Intake + +#### Intake + +The application as the client experiences it - what they report. + +| Entity | Purpose | +|--------|---------| +| **Application** | The submission requesting benefits | +| **Person** | People mentioned on the application (household members, absent parents, sponsors, etc.) | +| **Income** | Income the client claims | +| **Expense** | Expenses the client claims | +| **Resource** | Resources/assets the client claims | +| **LivingArrangement** | Who lives where, relationships as reported | + +**Key decisions:** +- This is the "source of truth" for what the client told us +- Different types of people on an application: + - **Household members** - people in the eligibility unit (seeking benefits) + - **Other occupants** - live there but not part of benefits household + - **Related parties** - absent parents, sponsors, non-custodial parents + - **Representatives** - authorized representatives, application assisters +- Application is client-facing; eligibility interpretation happens in Eligibility domain + +#### Eligibility + +Program-specific interpretation of application data and benefit determination. + +| Entity | Purpose | +|--------|---------| +| **EligibilityRequest** | An evaluation of a client + program (initial, recertification, or change) | +| **EligibilityUnit** | Program-specific grouping (e.g., SNAP "household", Medicaid "tax unit") | +| **Determination** | The outcome for a client + program | +| **VerificationRequirement** | What a program requires to be verified and how | + +**Key decisions:** +- `EligibilityRequest` handles all evaluation types via `requestType`: initial applications, scheduled recertifications, client-initiated renewals, and mid-certification changes +- "EligibilityUnit" is the entity; regulatory terms like "household" or "tax unit" appear in descriptions +- Eligibility happens at the intersection of: **who** (client) + **what** (program) + **when** (point in time) +- A single application may contain multiple clients applying for multiple programs - each combination gets its own EligibilityRequest +- Recertifications link to the previous Determination, creating a history chain + +#### Case Management + +Ongoing client relationships and staff assignments. **[Detailed schemas →](domains/case-management.md)** + +| Entity | Purpose | +|--------|---------| +| **Case** | The ongoing relationship with a client/household | +| **CaseWorker** | Staff member who processes applications | +| **Supervisor** | Extends CaseWorker with approval authority, team capacity, escalation handling | +| **Office** | Geographic or organizational unit (county, regional, state) | +| **Assignment** | Who is responsible for what | +| **Caseload** | Workload for a case worker | +| **Team** | Group of case workers | + +**Key decisions:** +- Case Management is about relationships: "Who's handling this? What's the history?" +- Office enables geographic routing and reporting by county/region +- Separate from Workflow (which is about work items) + +#### Workflow + +Work items, tasks, and SLA tracking. **[Detailed schemas →](domains/workflow.md)** + +| Entity | Purpose | +|--------|---------| +| **Task** | A work item requiring action | +| **Queue** | Organizes tasks by team, county, program, or skill | +| **WorkflowRule** | Defines automatic task routing and prioritization logic | +| **VerificationTask** | Task to verify data - either validation (accuracy) or program verification (evidence standards) | +| **VerificationSource** | External services/APIs for data validation (IRS, ADP, state databases) | +| **TaskAuditEvent** | Immutable audit trail | + +**Key decisions:** +- Workflow is about work items: "What needs to be done? Is it on track?" +- Queues organize tasks for routing and monitoring +- WorkflowRules enable automatic task routing and prioritization based on program, office, skills, and client attributes +- Verification has two purposes: + - **Data validation**: Is the intake data accurate? (check against external sources) + - **Program verification**: Does the data meet program evidence standards? +- VerificationTask connects Intake data → External Sources → Eligibility requirements +- Tasks are assigned to CaseWorkers (connects to Case Management) + +#### Communication (Cross-Cutting) + +Official notices and correspondence that can originate from any domain. **[Detailed schemas →](cross-cutting/communication.md)** + +| Entity | Purpose | +|--------|---------| +| **Notice** | Official communication (approval, denial, RFI, etc.) | +| **Correspondence** | Other communications | +| **DeliveryRecord** | Tracking of delivery status | + +**Key decisions:** +- Communication is cross-cutting because notices can be triggered by events in any domain: + - Intake: "Application received" + - Eligibility: "Approved", "Denied", "Request for information" + - Workflow: "Documents needed", "Interview scheduled" + - Case Management: "Case worker assigned" +- Entities live in a Communication domain but are consumed/triggered by all domains + +#### Scheduling + +Time-based coordination. + +| Entity | Purpose | +|--------|---------| +| **Appointment** | Scheduled meeting | +| **Interview** | Required interview for eligibility | +| **Reminder** | Notification of upcoming events | + +#### Document Management + +Files and uploads. + +| Entity | Purpose | +|--------|---------| +| **Document** | Metadata about a document | +| **Upload** | The actual file | + +--- + +## 2. Data Flow Between Domains + +``` +╔═════════════════════════════════════════════════════════════════════════════╗ +║ CROSS-CUTTING: Communication, Reporting, Configuration Mgmt, Observability ║ +╚═════════════════════════════════════════════════════════════════════════════╝ + +┌─────────────────────────────────────────────────────────────────────┐ +│ CLIENT PERSPECTIVE │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ INTAKE │ +│ Application, Person, Income, Expense, Resource, LivingArrangement │ +│ "What the client told us" │ +└─────────────────────────────────────────────────────────────────────┘ + │ │ + ▼ │ + ┌───────────────────────────────┐ │ + │ CLIENT MANAGEMENT │ │ + │ Client, Relationship, │ │ + │ LivingArrangement, Income │ │ + │ "Persist people seeking │ │ + │ benefits" │ │ + └───────────────────────────────┘ │ + │ + ┌────────────────────────────────────────┤ + │ │ + │ (SNAP, TANF) │ (MAGI Medicaid - + │ Caseworker review │ automated path) + ▼ │ +┌───────────────────────────────┐ │ +│ CASE MANAGEMENT │ │ +│ Case, CaseWorker, Supervisor,│ │ +│ Assignment, Caseload │ │ +│ "Who's responsible" │ │ +└───────────────────────────────┘ │ + │ │ + ▼ ▼ +┌───────────────────────────────┐ ┌─────────────────────────────────┐ +│ WORKFLOW │ │ ELIGIBILITY │ +│ Task, VerificationTask, │──▶│ EligibilityRequest, │ +│ SLA, TaskAuditEvent │ │ EligibilityUnit, Determination │ +│ "What work needs to be done" │◀──│ "Program-specific │ +└───────────────────────────────┘ │ interpretation" │ + └─────────────────────────────────┘ +``` + +**Flow notes:** +- Intake data flows to Client Management (persist clients) and feeds into Eligibility +- Case workers are typically assigned to review intake data before eligibility determination +- Workflow tasks support the eligibility process (verification, document review) +- **MAGI Medicaid** can often be determined automatically without caseworker involvement (no asset test, standardized income rules, electronic data verification) +- **SNAP and TANF** typically require caseworker review due to asset tests, complex household rules, and interview requirements + +--- + +## 3. Safety Net Specific Concerns + +### Regulatory/Compliance + +| Concern | Example | +|---------|---------| +| **Mandated timelines** | SNAP: 30-day processing, 7-day expedited; Medicaid: 45-day determination | +| **SLA tracking** | Federal reporting on timeliness rates | +| **Audit trails** | Everything must be documented for federal audits | +| **Notice requirements** | Specific notices at specific points (denial, approval, RFI) | + +### Multi-Program Complexity + +| Concern | Example | +|---------|---------| +| **One application, multiple programs** | Client applies for SNAP, Medicaid, and TANF together | +| **Multiple clients per application** | Household members each applying for different programs | +| **Program-specific households** | SNAP household ≠ Medicaid tax unit ≠ IRS household | +| **Different timelines per program** | SNAP 30-day vs Medicaid 45-day | + +### Operational + +| Concern | Example | +|---------|---------| +| **Document verification** | Tasks to verify income, identity, residency (program-specific) | +| **Request for Information (RFI)** | Client has X days to respond before adverse action | +| **Inter-agency handoffs** | Tasks may transfer between county offices, state agencies | +| **Accommodations** | Language, disability, or other special handling flags | +| **Caseload management** | Assigning/balancing work across case workers | +| **Recertification** | Periodic re-evaluation of eligibility | +| **Appeals** | Formal appeal processes with their own timelines | + +### Privacy + +| Concern | Example | +|---------|---------| +| **PII protection** | All domains contain sensitive information | +| **Role-based access** | Different visibility for workers, supervisors, auditors | + +--- + +## 4. Detailed Schemas + +Detailed schemas have been moved to domain-specific files for better organization: + +| Domain | File | +|--------|------| +| Workflow | [domains/workflow.md](domains/workflow.md) | +| Case Management | [domains/case-management.md](domains/case-management.md) | +| Communication | [cross-cutting/communication.md](cross-cutting/communication.md) | + +*Note: Client Management, Intake, Eligibility, Scheduling, and Document Management schemas will be added as those domains are implemented. Reporting aggregates data from other domains and doesn't have its own schemas.* + +For operational concerns (Configuration Management, Observability), see [API Architecture](api-architecture.md). + +--- + +## Related Documents + +| Document | Description | +|----------|-------------| +| [API Architecture](api-architecture.md) | API layers, vendor independence, operational architecture | +| [Design Decisions](design-decisions.md) | Key decisions with rationale and alternatives | +| [Roadmap](roadmap.md) | Migration, implementation phases, future considerations | +| [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml) | Machine-readable API design patterns | diff --git a/docs/architecture/domains/case-management.md b/docs/architecture/domains/case-management.md new file mode 100644 index 0000000..b1600b1 --- /dev/null +++ b/docs/architecture/domains/case-management.md @@ -0,0 +1,463 @@ +# Case Management Domain + +Detailed schemas for the Case Management domain. See [Domain Design Overview](../domain-design.md) for context. + +## Overview + +The Case Management domain manages ongoing client relationships, staff, and organizational structure. + +| Entity | Purpose | +|--------|---------| +| **Case** | The ongoing relationship with a client/household | +| **CaseWorker** | Staff member who processes applications | +| **Supervisor** | Extends CaseWorker with approval authority | +| **Office** | Geographic or organizational unit (county, regional, state) | +| **Assignment** | Who is responsible for what | +| **Caseload** | Workload for a case worker | +| **Team** | Group of case workers | + +### Cases vs Tasks + +A **Case** is the long-lived relationship with a client or household—it spans years, multiple applications, and multiple programs. A **Task** is a discrete unit of work with a deadline. + +- A case worker can be **assigned to a case** (ongoing responsibility for a client) or **assigned to a task** (one-time work item) +- Tasks can be **case-level** (e.g., annual renewal, quality audit) or **application-level** (e.g., verify income for a specific application) +- Transferring a case typically transfers its active tasks as well + +See [Workflow domain](workflow.md#tasks-vs-cases) for more detail on the Task entity. + +--- + +## Capabilities + +See [Workflow domain](workflow.md) for the complete workflow capabilities table. Case Management provides the worker, team, and organizational data that Workflow uses for task routing and assignment. + +**Domain responsibilities:** +- **Case Management**: Tracks *who* is assigned and manages worker/team/office data +- **Workflow**: Tracks *task state* and applies routing rules + +--- + +## Schemas + +### Office + +Geographic or organizational unit for routing and reporting. + +```yaml +Office: + properties: + id: uuid + name: string # "County A Office", "Regional Office - North" + officeType: + - county # County-level office + - regional # Regional office overseeing multiple counties + - state # State-level office + - satellite # Satellite/outreach location + parentOfficeId: uuid # For hierarchies (e.g., county reports to regional) + countyCode: string # FIPS county code if applicable + address: Address + phoneNumber: PhoneNumber + email: Email + timezone: string # For SLA calculations + programs: enum[] # Programs served: snap, medicaid, tanf + status: + - active + - inactive + - temporarily_closed + createdAt, updatedAt: datetime +``` + +### CaseWorker + +Staff member who processes applications and tasks. + +```yaml +CaseWorker: + properties: + id: uuid + name: Name + employeeId: string + email: Email + phoneNumber: PhoneNumber + officeId: uuid # Primary office assignment + role: + - intake_worker + - eligibility_worker + - quality_reviewer + - appeals_specialist + status: + - active + - inactive + - on_leave + supervisorId: uuid + teamId: uuid + skills: CaseWorkerSkill[] # Skills, certifications, languages + programs: enum[] # Programs certified to work: snap, medicaid, tanf + workloadCapacity: integer # Max concurrent tasks + currentWorkload: integer # Current assigned tasks (computed) + createdAt, updatedAt: datetime +``` + +### CaseWorkerSkill + +Skills, certifications, and language abilities for a case worker. + +```yaml +CaseWorkerSkill: + properties: + skillId: string # Unique identifier for the skill type + name: string # "SNAP Eligibility", "Expedited Processing", "Spanish" + skillType: + - certification # Formal credential (training, exam) + - language # Language proficiency + - specialization # Area of expertise without formal cert + # For certifications: + certification: # Only present when skillType = certification + issuedDate: date + expirationDate: date + issuingAuthority: string # Who issued the certification + # For languages: + proficiencyLevel: # Only present when skillType = language + - basic + - conversational + - fluent + - native + status: + - active + - inactive + - expired # For certifications past expiration + - pending_renewal +``` + +### Supervisor + +Extends CaseWorker with supervisory responsibilities. + +```yaml +Supervisor: + extends: CaseWorker + properties: + # Inherits all CaseWorker fields, plus: + approvalAuthority: # What this supervisor can approve + - eligibility_determination + - expedited_processing + - denial + - appeal_decision + - exception_request + teamCapacity: integer # Max team members they can supervise + currentTeamSize: integer # Current direct reports (computed) + canHandleEscalations: boolean + escalationTypes: # Types of escalations they handle + - sla_breach + - client_complaint + - complex_case + - inter_agency +``` + +### Case + +The ongoing relationship with a client or household. + +```yaml +Case: + properties: + id: uuid + clientId: uuid # Primary client + householdClientIds: uuid[] # All clients in the household + officeId: uuid # Assigned office + assignedWorkerId: uuid # Primary case worker + status: + - active + - inactive + - closed + - transferred + programs: enum[] # Active programs: snap, medicaid, tanf + openedDate: datetime + closedDate: datetime + closureReason: string + createdAt, updatedAt: datetime +``` + +### Assignment + +Tracks who is responsible for what work. + +```yaml +Assignment: + properties: + id: uuid + assignmentType: + - case # Assigned to a case + - application # Assigned to an application + - task # Assigned to a task + referenceId: uuid # ID of case, application, or task + assignedToId: uuid # CaseWorker or Supervisor + assignedById: uuid # Who made the assignment + assignedAt: datetime + reason: string # Why this assignment was made + status: + - active + - reassigned + - completed + createdAt, updatedAt: datetime +``` + +### Caseload + +Workload tracking for a case worker. + +```yaml +Caseload: + properties: + id: uuid + caseWorkerId: uuid + asOfDate: date # Snapshot date + # Counts + activeCases: integer + activeTasks: integer + pendingTasks: integer + overdueTask: integer + # By program + casesByProgram: + snap: integer + medicaid: integer + tanf: integer + # SLA status + tasksOnTrack: integer + tasksAtRisk: integer + tasksBreached: integer + createdAt: datetime +``` + +### Team + +Group of case workers. + +```yaml +Team: + properties: + id: uuid + name: string # "SNAP Intake Team A" + description: string + officeId: uuid # Office this team belongs to + supervisorId: uuid # Team supervisor + programs: enum[] # Programs this team handles + status: + - active + - inactive + createdAt, updatedAt: datetime +``` + +--- + +## Key Relationships + +``` +Office (1) ──────< (many) CaseWorker +Office (1) ──────< (many) Team +Team (1) ────────< (many) CaseWorker +Supervisor (1) ──< (many) CaseWorker (via supervisorId) +CaseWorker (1) ──< (many) Task (via assignedToId) +CaseWorker (1) ──< (many) Case (via assignedWorkerId) +``` + +--- + +## Process APIs + +Process APIs orchestrate business operations by calling System APIs. They follow the pattern `POST /processes/{domain}/{resource}/{action}` and use `x-actors` and `x-capability` metadata. + +See [API Architecture](../api-architecture.md) for the full Process API pattern. + +### Assignment Operations + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/case-management/workers/assign` | supervisor, system | Assign worker to case, application, or task | +| `POST /processes/case-management/cases/transfer` | supervisor | Transfer case to different office/worker | + +### Workload Management + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/case-management/teams/rebalance` | supervisor | Redistribute tasks across team members | +| `POST /processes/case-management/workers/update-availability` | caseworker, supervisor | Update worker status and availability | +| `GET /processes/case-management/workers/capacity` | supervisor, system | Get worker capacity for assignment decisions | + +--- + +### Assign Worker + +Assign a worker to a case, application, or task. + +```yaml +POST /processes/case-management/workers/assign +x-actors: [supervisor, system] +x-capability: case-management + +requestBody: + assignmentType: + - case + - application + - task + referenceId: uuid # ID of case, application, or task + assignedToId: uuid # CaseWorker to assign + reason: string # Why this assignment + +responses: + 200: + assignment: Assignment # New assignment record + previousAssignment: Assignment # If reassigning + +# Orchestrates: +# 1. Validate worker has capacity (workloadCapacity vs currentWorkload) +# 2. Validate worker has required skills/programs +# 3. Close previous Assignment if exists +# 4. Create new Assignment record +# 5. Update Case/Task.assignedToId +# 6. Update Caseload for both workers (if reassignment) +# 7. Create TaskAuditEvent if task assignment +``` + +### Transfer Case + +Transfer a case to a different office or worker. + +```yaml +POST /processes/case-management/cases/transfer +x-actors: [supervisor] +x-capability: case-management + +requestBody: + caseId: uuid # Case to transfer + targetOfficeId: uuid # New office (optional) + targetWorkerId: uuid # New worker (optional) + transferReason: + - client_moved # Client relocated + - workload_balance # Rebalancing + - skill_match # Needs specialist + - client_request # Client requested + notes: string # Transfer details + +responses: + 200: + case: Case # Updated case + assignment: Assignment # New assignment + transferredTasks: Task[] # Tasks moved with case + +# Orchestrates: +# 1. Validate supervisor authority over case +# 2. Update Case.officeId and/or Case.assignedWorkerId +# 3. Transfer all active tasks to new worker/queue +# 4. Create Assignment records +# 5. Update Caseload for both workers +# 6. If office changed, re-route tasks through WorkflowRules +``` + +### Rebalance Team Workload + +Redistribute tasks across team members based on capacity. + +```yaml +POST /processes/case-management/teams/rebalance +x-actors: [supervisor] +x-capability: case-management + +requestBody: + teamId: uuid # Team to rebalance + strategy: + - by_capacity # Distribute by available capacity + - by_skill # Match skills to tasks + - even # Equal distribution + includeTaskTypes: string[] # Only these task types (optional) + excludeWorkerIds: uuid[] # Workers to exclude (e.g., on leave) + +responses: + 200: + reassignments: ReassignmentSummary[] + totalMoved: integer + newDistribution: WorkerLoadSummary[] + +# Orchestrates: +# 1. Get all team members and their current Caseload +# 2. Get all unassigned/redistributable tasks in team's queues +# 3. Calculate optimal distribution based on strategy +# 4. Batch reassign tasks +# 5. Update Caseload for all affected workers +# 6. Create TaskAuditEvents for all reassignments +``` + +### Update Worker Availability + +Update a worker's status and availability, triggering reassignment if needed. + +```yaml +POST /processes/case-management/workers/update-availability +x-actors: [caseworker, supervisor] +x-capability: case-management + +requestBody: + workerId: uuid # Worker to update + status: + - active + - on_leave + - inactive + effectiveDate: date # When status takes effect + expectedReturnDate: date # For on_leave + reassignTasks: boolean # Reassign current tasks? + reassignTo: + - queue # Return to queue + - team # Distribute to team + - specific_worker # Assign to targetWorkerId + targetWorkerId: uuid # For specific_worker + +responses: + 200: + worker: CaseWorker # Updated worker + reassignedTasks: integer # Count if reassigned + +# Orchestrates: +# 1. Update CaseWorker.status +# 2. If reassignTasks and status != active: +# a. Get all tasks assigned to worker +# b. Reassign based on strategy +# c. Update Caseload +# d. Create TaskAuditEvents +# 3. If returning from leave, optionally reclaim tasks +``` + +### Get Worker Capacity + +Get real-time capacity information for assignment decisions. + +```yaml +GET /processes/case-management/workers/capacity +x-actors: [supervisor, system] +x-capability: case-management + +parameters: + workerId: uuid # Specific worker (optional) + teamId: uuid # All workers on team (optional) + officeId: uuid # All workers in office (optional) + programType: string # Filter by program certification + requiredSkills: string[] # Filter by skills + +responses: + 200: + workers: WorkerCapacity[] + - workerId: uuid + name: string + currentLoad: integer + maxCapacity: integer + availableCapacity: integer + skills: string[] + programs: string[] + tasksAtRisk: integer # Tasks approaching SLA + +# Orchestrates: +# 1. Query CaseWorkers matching filters +# 2. Get current Caseload for each +# 3. Calculate available capacity +# 4. Return sorted by available capacity +``` diff --git a/docs/architecture/domains/workflow.md b/docs/architecture/domains/workflow.md new file mode 100644 index 0000000..f20ef67 --- /dev/null +++ b/docs/architecture/domains/workflow.md @@ -0,0 +1,853 @@ +# Workflow Domain + +Detailed schemas for the Workflow domain. See [Domain Design Overview](../domain-design.md) for context. + +## Overview + +The Workflow domain manages work items, tasks, SLA tracking, and task routing. + +| Entity | Purpose | +|--------|---------| +| **Task** | A work item requiring action | +| **Queue** | Organizes tasks by team, county, program, or skill | +| **WorkflowRule** | Defines automatic task routing and prioritization logic | +| **VerificationTask** | Task to verify data (extends Task) | +| **VerificationSource** | External services/APIs for data validation | +| **TaskSLAInfo** | SLA tracking details (embedded in Task) | +| **TaskAuditEvent** | Immutable audit trail | + +### Tasks vs Cases + +**Tasks** and **Cases** serve different purposes: + +| | Task | Case | +|---|------|------| +| **Lifespan** | Short-lived (created → worked → completed) | Long-lived (spans years, multiple programs) | +| **Purpose** | A discrete unit of work with a deadline | The ongoing relationship with a client/household | +| **Examples** | Verify income, determine eligibility, send notice | The Smith household's SNAP and Medicaid participation | +| **Owned by** | Workflow domain | Case Management domain | + +**Tasks can be linked at two levels:** + +- **Application-level tasks**: Tied to a specific application (e.g., verify income for application #123, determine eligibility for a new SNAP application) +- **Case-level tasks**: Tied to the ongoing case, not a specific application (e.g., annual renewal review, case maintenance, quality audit) + +Both `applicationId` and `caseId` are optional on a Task—a task will have one or both depending on context. + +--- + +## Capabilities + +| Capability | Supported By | +|------------|--------------| +| **Supervisor - Tasks** | | +| Create task manually | `POST /processes/workflow/tasks/create` | +| Reassign task to worker/queue | `POST /processes/workflow/tasks/reassign` | +| Set or change task priority | `POST /processes/workflow/tasks/reassign` (with priority) | +| Bulk reassign or reprioritize | `POST /processes/workflow/tasks/bulk-reassign` | +| Escalate task | `POST /processes/workflow/tasks/escalate` | +| **Supervisor - Cases** | | +| Assign worker to case | `POST /processes/case-management/workers/assign` | +| Transfer case to office/worker | `POST /processes/case-management/cases/transfer` | +| **Supervisor - Monitoring** | | +| Monitor task queues | `GET /queues`, `GET /tasks` (System APIs) | +| Monitor team workload | `GET /caseloads` (Case Mgmt System API) | +| Monitor deadlines and alerts | `GET /tasks?q=slaStatus:at_risk` (System API) | +| **Caseworker** | | +| Claim task from queue | `POST /processes/workflow/tasks/claim` | +| Complete task with outcome | `POST /processes/workflow/tasks/complete` | +| Release task to queue | `POST /processes/workflow/tasks/release` | +| Escalate task | `POST /processes/workflow/tasks/escalate` | +| Start verification | `POST /processes/workflow/verification/start` | +| Complete verification | `POST /processes/workflow/verification/complete` | +| **System/Automation** | | +| Create task on events | `POST /processes/workflow/tasks/create` | +| Route and prioritize by rules | `POST /processes/workflow/tasks/route` | +| Auto-verify data | `POST /processes/workflow/verification/start` | +| **Future** | | +| Forecast staffing needs | See [Future Considerations](../roadmap.md) | +| Run productivity/backlog reports | TBD | + +**Notes:** +- Task creation is event-driven: triggered by application submission, eligibility determination, verification needs, etc. +- Case capabilities reference [Case Management](case-management.md) Process APIs; task capabilities reference Workflow Process APIs. +- Staff and organizational entities (CaseWorker, Office, Team, Caseload) are in the [Case Management domain](case-management.md). +- Workflow tracks *task state* changes. Case Management tracks *who* is assigned and assignment history. +- Auto-assign rules (`WorkflowRule`) live here; auto-assign data (Office, Caseload, Skills) lives in Case Management. + +--- + +## Schemas + +### Task + +The core work item representing an action that needs to be completed. + +```yaml +Task: + properties: + id: uuid + taskTypeCode: string # Reference to TaskType.code (e.g., "verify_income") + status: + - pending + - in_progress + - awaiting_client + - awaiting_verification + - awaiting_review + - returned_to_queue # Caseworker released task + - completed + - cancelled + - escalated + priority: + - expedited # 7-day SNAP, emergency + - high # Approaching deadline + - normal # Standard processing + - low # Deferred/backlog + # Context: a task is linked to an application, a case, or both + applicationId: uuid # Reference to Application (Intake) - for application-level tasks + caseId: uuid # Reference to Case (Case Management) - for case-level tasks + assignedToId: uuid # Reference to CaseWorker (Case Management) + queueId: uuid # Reference to Queue + officeId: uuid # Reference to Office (Case Management) + programType: enum # TODO: Standardize ProgramType enum across all schemas + requiredSkills: string[] # Skills needed to work this task + dueDate: datetime # SLA deadline + slaTypeCode: string # Reference to SLAType.code (e.g., "snap_expedited") + slaInfo: TaskSLAInfo # SLA tracking details (computed from slaTypeCode) + sourceInfo: TaskSourceInfo # What triggered this task + parentTaskId: uuid # For subtasks + blockedByTaskIds: uuid[] # Dependencies + outcomeInfo: TaskOutcomeInfo # Completion details + createdAt, updatedAt: datetime +``` + +### Queue + +Organizes tasks into logical groupings for routing and monitoring. + +```yaml +Queue: + properties: + id: uuid + name: string # "SNAP Intake - County A" + description: string + queueType: + - team # For a specific team + - office # For a specific office/county + - program # For a specific program + - skill # For tasks requiring specific skills + - general # Default/catch-all + teamId: uuid # Optional: linked Team + officeId: uuid # Optional: linked Office + programType: enum # TODO: Standardize ProgramType enum across all schemas + requiredSkills: string[] # Skills needed to work tasks in this queue + isDefault: boolean # Default queue for unassigned tasks + priority: integer # Queue processing priority (lower = higher priority) + status: + - active + - inactive + - paused # Temporarily not accepting new tasks + createdAt, updatedAt: datetime +``` + +### WorkflowRule + +Defines automatic task routing and prioritization logic. Uses [JSON Logic](https://jsonlogic.com/) for flexible, extensible conditions. + +```yaml +WorkflowRule: + properties: + id: uuid + name: string # "Route SNAP to County A", "Expedite households with children under 6" + description: string + ruleType: + - assignment # Routes tasks to queues/teams/workers + - priority # Sets task priority level + evaluationOrder: integer # Rule evaluation order (lower = evaluated first) + isActive: boolean + # Conditions - JSON Logic expression evaluated against task + application context + conditions: object # JSON Logic expression (see examples below) + # Action - what happens when conditions match + # For assignment rules: + assignmentStrategy: + - specific_queue # Assign to targetQueueId + - specific_team # Assign to targetTeamId + - round_robin # Distribute evenly across team/queue members + - least_loaded # Assign to worker with lowest caseload + - skill_match # Match task requiredSkills to worker skills + targetQueueId: uuid # For specific_queue strategy + targetTeamId: uuid # For specific_team strategy + fallbackQueueId: uuid # If primary assignment fails + # For priority rules: + targetPriority: + - expedited + - high + - normal + - low + createdAt, updatedAt: datetime +``` + +**JSON Logic Condition Examples:** + +```json +// Route SNAP tasks from County A to specific queue +{ + "and": [ + { "==": [{ "var": "task.programType" }, "snap"] }, + { "==": [{ "var": "task.officeId" }, "county-a-id"] } + ] +} + +// Expedite for households with children under 6 +{ + "<": [{ "var": "application.household.youngestChildAge" }, 6] +} + +// High priority when deadline within 5 days +{ + "<=": [{ "var": "task.daysUntilDeadline" }, 5] +} + +// Skill-based routing for appeals +{ + "in": [{ "var": "task.taskTypeCode" }, ["appeal_review", "hearing_preparation"]] +} +``` + +**Available context variables:** +- `task.*` - Task fields (taskTypeCode, programType, officeId, daysUntilDeadline, etc.) +- `application.*` - Application data (household, income, etc.) +- `case.*` - Case data (if case-level task) + +### TaskSLAInfo + +SLA tracking details embedded in Task. The SLA type is referenced via `Task.slaTypeCode`. + +```yaml +TaskSLAInfo: + properties: + # Note: slaTypeCode is on Task, not here (avoids duplication) + slaDeadline: datetime + clockStartDate: datetime + clockPausedAt: datetime # When paused (awaiting client) + totalPausedDays: integer + slaStatus: + - on_track + - at_risk + - breached + - paused + - completed + warningThresholdDays: integer # Computed from SLAType config +``` + +### TaskAuditEvent + +Immutable audit trail for task actions. + +```yaml +TaskAuditEvent: + properties: + id: uuid + taskId: uuid + eventType: + - created + - assigned + - reassigned + - returned_to_queue # Caseworker released task + - status_changed + - priority_changed + - queue_changed + - note_added + - due_date_changed + - escalated + - completed + - cancelled + - sla_warning + - sla_breached + previousValue: string + newValue: string + performedById: uuid + systemGenerated: boolean + notes: string + occurredAt: datetime (readonly) +``` + +### VerificationSource + +External services and APIs available for data validation. + +```yaml +VerificationSource: + properties: + id: uuid + name: string # "IRS Income Verification", "ADP Employment", "State Wage Database" + sourceType: + - federal_agency # IRS, SSA, DHS/SAVE + - state_database # State wage records, DMV + - commercial_service # ADP, Equifax, LexisNexis + - financial_institution # Banks (for asset verification) + dataTypes: [] # What this source can verify: income, employment, identity, etc. + integrationMethod: + - realtime_api # Real-time API call + - batch # Batch file exchange + - manual_lookup # Manual lookup by worker + trustLevel: + - authoritative # IRS, SSA - can override client-reported data + - supplementary # Supports but doesn't override + - reference # For comparison only + status: + - active + - inactive + - maintenance + createdAt, updatedAt: datetime +``` + +### VerificationTask + +Task to verify intake data - either for accuracy (data validation) or program requirements (program verification). + +```yaml +VerificationTask: + extends: Task + properties: + # Inherits Task fields (id, status, priority, assignedToId, queueId, officeId, etc.) + verificationType: + - data_validation # Is the intake data accurate? + - program_verification # Does it meet program requirements? + - both # Satisfies both purposes + # What's being verified (Intake reference) + applicationId: uuid + dataPath: string # Path to specific data (e.g., "income[0].amount", "person[2].citizenship") + reportedValue: string # The value client reported + # For data validation + verificationSourceId: uuid # Which external source to check + sourceResult: + matchStatus: + - match + - mismatch + - partial_match + - not_found + - source_unavailable + sourceValue: string # Value returned from external source + confidence: number # Match confidence (0-100) if applicable + retrievedAt: datetime + # For program verification + eligibilityRequestId: uuid # Which eligibility request this is for + verificationRequirementId: uuid # Which program requirement applies + documentIds: uuid[] # Supporting documents submitted + # Outcome + outcome: + - verified + - not_verified + - discrepancy_found + - waived + - pending_documentation + resolution: # If discrepancy found + - client_corrected # Client updated their reported data + - source_error # External source had incorrect data + - data_accepted # Accepted despite mismatch (with justification) + - referred_for_review # Escalated for supervisor review + resolutionNotes: string + verifiedAt: datetime + verifiedById: uuid # CaseWorker who completed verification +``` + +--- + +## Configuration Schemas + +These schemas define configurable lookup data that can be extended without schema changes. + +### TaskType + +Defines the types of tasks that can be created. New task types can be added without schema changes. + +```yaml +TaskType: + properties: + code: string (PK) # "verify_income", "eligibility_determination" + category: + - verification # Document/data verification tasks + - determination # Eligibility determination tasks + - communication # Client communication tasks + - review # Supervisor/quality review tasks + - inter_agency # Inter-agency coordination tasks + - renewal # Renewal/recertification tasks + - appeal # Appeals processing tasks + name: string # "Verify Income", "Eligibility Determination" + description: string + defaultSLATypeCode: string # Reference to SLAType.code + defaultPriority: string # Default priority for this task type + requiredSkills: string[] # Default skills needed + isActive: boolean +``` + +**Example task types:** + +| Code | Category | Name | Default SLA | +|------|----------|------|-------------| +| `verify_income` | verification | Verify Income | snap_standard | +| `verify_identity` | verification | Verify Identity | snap_standard | +| `eligibility_determination` | determination | Eligibility Determination | snap_standard | +| `expedited_screening` | determination | Expedited Screening | snap_expedited | +| `supervisor_review` | review | Supervisor Review | internal_review | +| `renewal_review` | renewal | Renewal Review | renewal_standard | +| `appeal_review` | appeal | Appeal Review | appeal_standard | + +### SLAType + +Defines SLA configurations for different programs and task types. + +```yaml +SLAType: + properties: + code: string (PK) # "snap_expedited", "medicaid_standard" + name: string # "SNAP Expedited Processing" + programType: enum # TODO: Standardize ProgramType enum across all schemas + durationDays: integer # 7, 30, 45, etc. + warningThresholdDays: integer # Days before deadline to show warning + pauseOnStatuses: string[] # Task statuses that pause the clock + isActive: boolean +``` + +**Example SLA types:** + +| Code | Program | Duration | Warning | +|------|---------|----------|---------| +| `snap_standard` | snap | 30 days | 5 days | +| `snap_expedited` | snap | 7 days | 2 days | +| `medicaid_standard` | medicaid | 45 days | 7 days | +| `medicaid_disability` | medicaid | 90 days | 14 days | +| `tanf_standard` | tanf | 30 days | 5 days | +| `appeal_standard` | (any) | varies by state | 7 days | + +--- + +## API Design Notes + +### Batch Operations + +For bulk task management during surges, the Task API should support batch operations: + +```yaml +# Batch update endpoint +PATCH /tasks/batch + requestBody: + taskIds: uuid[] # Tasks to update + updates: + assignedToId: uuid # Reassign to this worker + queueId: uuid # Move to this queue + priority: string # Change priority + status: string # Change status + responses: + 200: + updated: integer # Count of successfully updated tasks + failed: TaskUpdateError[] # Any failures +``` + +### Skill Matching + +When using `skill_match` assignment strategy: +1. Task's `requiredSkills` are compared against CaseWorker's `skills` +2. Only workers with all required skills are considered +3. Among qualified workers, `least_loaded` logic is applied + +--- + +## Process APIs + +Process APIs orchestrate business operations by calling System APIs. They follow the pattern `POST /processes/{domain}/{resource}/{action}` and use `x-actors` and `x-capability` metadata. + +See [API Architecture](../api-architecture.md) for the full Process API pattern. + +### Task Lifecycle + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/workflow/tasks/create` | supervisor, system | Create a new task (manual or event-triggered) | +| `POST /processes/workflow/tasks/claim` | caseworker | Claim an unassigned task from a queue | +| `POST /processes/workflow/tasks/complete` | caseworker | Complete a task with outcome | +| `POST /processes/workflow/tasks/release` | caseworker | Return a task to the queue | +| `POST /processes/workflow/tasks/reassign` | supervisor | Reassign a task to different worker/queue | +| `POST /processes/workflow/tasks/escalate` | caseworker, supervisor | Escalate a task to supervisor | +| `POST /processes/workflow/tasks/bulk-reassign` | supervisor | Reassign multiple tasks | + +### Task Routing + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/workflow/tasks/route` | system | Apply workflow rules to determine queue/assignment | + +### Verification + +| Endpoint | Actors | Description | +|----------|--------|-------------| +| `POST /processes/workflow/verification/start` | caseworker, system | Initiate external data verification | +| `POST /processes/workflow/verification/complete` | caseworker, system | Record verification result | + +--- + +### Create Task + +Create a new task, either manually by a supervisor or triggered by system events. + +```yaml +POST /processes/workflow/tasks/create +x-actors: [supervisor, system] +x-capability: workflow + +requestBody: + taskType: string # Type of task (verify_income, eligibility_determination, etc.) + applicationId: uuid # Associated application (optional) + caseId: uuid # Associated case (optional) + programType: string # snap, medicaid, tanf + priority: string # expedited, high, normal, low (optional, can be set by rules) + dueDate: datetime # Explicit deadline (optional, can be calculated from SLA) + requiredSkills: string[] # Skills needed (optional) + notes: string # Context for the task + sourceInfo: # What triggered this task + sourceType: string # application_submitted, determination_complete, manual, etc. + sourceId: uuid # ID of triggering entity + sourceDomain: string # intake, eligibility, case-management, etc. + skipRouting: boolean # If true, don't apply routing rules + targetQueueId: uuid # Direct assignment to queue (if skipRouting) + targetWorkerId: uuid # Direct assignment to worker (if skipRouting) + +responses: + 201: + task: Task # Created task + assignment: Assignment # If worker was assigned + rulesApplied: string[] # Routing/priority rules that matched + +# Orchestrates: +# 1. Create Task with provided fields +# 2. Calculate SLA deadline based on taskType and programType +# 3. If not skipRouting, apply WorkflowRules for priority and queue +# 4. If rule assigns to worker, create Assignment +# 5. Create TaskAuditEvent (created) +# 6. Update Caseload if worker assigned +``` + +### Claim Task + +Caseworker claims an unassigned task from a queue. + +```yaml +POST /processes/workflow/tasks/claim +x-actors: [caseworker] +x-capability: workflow + +requestBody: + taskId: uuid # Task to claim + notes: string # Optional claim notes + +responses: + 200: + task: Task # Updated task with assignedToId set + assignment: Assignment # New assignment record + +# Orchestrates: +# 1. Validate task is unassigned and in valid queue +# 2. Check worker has required skills (CaseWorker.skills) +# 3. Update Task.assignedToId, Task.status → in_progress +# 4. Create Assignment record +# 5. Create TaskAuditEvent (assigned) +# 6. Update Caseload for worker +``` + +### Complete Task + +Caseworker completes a task with an outcome. + +```yaml +POST /processes/workflow/tasks/complete +x-actors: [caseworker] +x-capability: workflow + +requestBody: + taskId: uuid # Task to complete + outcome: string # Task-specific outcome + notes: string # Completion notes + createFollowUp: boolean # Whether to create follow-up task + +responses: + 200: + task: Task # Task with status: completed + followUpTask: Task # If createFollowUp was true + +# Orchestrates: +# 1. Validate task is assigned to requesting worker +# 2. Update Task.status → completed, Task.outcomeInfo +# 3. Create TaskAuditEvent (completed) +# 4. Update Caseload for worker +# 5. If createFollowUp, create new Task and route it +# 6. If task type requires notice, trigger notice generation +``` + +### Release Task + +Caseworker returns a task to the queue (cannot complete it). + +```yaml +POST /processes/workflow/tasks/release +x-actors: [caseworker] +x-capability: workflow + +requestBody: + taskId: uuid # Task to release + reason: string # Why releasing (required) + suggestedSkills: string[] # Skills needed to complete + +responses: + 200: + task: Task # Task with status: returned_to_queue + +# Orchestrates: +# 1. Validate task is assigned to requesting worker +# 2. Update Task.status → returned_to_queue, clear assignedToId +# 3. Optionally update Task.requiredSkills +# 4. Update Assignment.status → reassigned +# 5. Create TaskAuditEvent (returned_to_queue) +# 6. Re-route task using WorkflowRules +``` + +### Reassign Task + +Supervisor reassigns a task to a different worker or queue. + +```yaml +POST /processes/workflow/tasks/reassign +x-actors: [supervisor] +x-capability: workflow + +requestBody: + taskId: uuid # Task to reassign + targetWorkerId: uuid # Assign to specific worker (optional) + targetQueueId: uuid # Move to queue (optional) + reason: string # Reassignment reason (required) + priority: string # Optionally change priority + +responses: + 200: + task: Task # Updated task + assignment: Assignment # New assignment record + +# Orchestrates: +# 1. Validate supervisor has authority over this task +# 2. Update Task.assignedToId or Task.queueId +# 3. Optionally update Task.priority +# 4. Create Assignment record +# 5. Create TaskAuditEvent (reassigned, priority_changed) +# 6. Update Caseload for affected workers +``` + +### Escalate Task + +Escalate a task to supervisor for review. + +```yaml +POST /processes/workflow/tasks/escalate +x-actors: [caseworker, supervisor] +x-capability: workflow + +requestBody: + taskId: uuid # Task to escalate + escalationType: + - sla_risk # At risk of missing SLA + - complex_case # Needs supervisor input + - policy_question # Policy clarification needed + - client_complaint # Client escalated issue + notes: string # Escalation details (required) + +responses: + 200: + task: Task # Task with status: escalated + escalatedTo: Supervisor # Supervisor who received escalation + +# Orchestrates: +# 1. Update Task.status → escalated +# 2. Identify appropriate supervisor (by team, escalation type) +# 3. Create Assignment to supervisor +# 4. Create TaskAuditEvent (escalated) +# 5. Optionally send notification to supervisor +``` + +### Bulk Reassign Tasks + +Supervisor reassigns multiple tasks during surge or rebalancing. + +```yaml +POST /processes/workflow/tasks/bulk-reassign +x-actors: [supervisor] +x-capability: workflow + +requestBody: + taskIds: uuid[] # Tasks to reassign + strategy: + - to_worker # Assign all to targetWorkerId + - to_queue # Move all to targetQueueId + - distribute # Distribute across targetWorkerIds + targetWorkerId: uuid # For to_worker strategy + targetQueueId: uuid # For to_queue strategy + targetWorkerIds: uuid[] # For distribute strategy + reason: string # Bulk reassignment reason + +responses: + 200: + updated: integer # Count of successfully updated + failed: TaskError[] # Any failures with reasons + assignments: Assignment[] # New assignment records + +# Orchestrates: +# 1. Validate supervisor authority over all tasks +# 2. For distribute strategy, balance by current workload +# 3. Batch update Tasks +# 4. Batch create Assignments +# 5. Batch create TaskAuditEvents +# 6. Update Caseload for all affected workers +``` + +### Route Task + +Apply workflow rules to determine task queue/assignment (typically system-initiated). + +```yaml +POST /processes/workflow/tasks/route +x-actors: [system] +x-capability: workflow + +requestBody: + taskId: uuid # Task to route + skipRules: boolean # Direct assignment without rules + targetQueueId: uuid # For skipRules: true + targetWorkerId: uuid # For skipRules: true + +responses: + 200: + task: Task # Task with queue/assignment set + rulesApplied: string[] # Names of rules that matched + assignment: Assignment # If worker was assigned + +# Orchestrates: +# 1. Load active WorkflowRules ordered by evaluationOrder +# 2. Evaluate priority rules → set Task.priority +# 3. Evaluate assignment rules → set Task.queueId +# 4. If strategy allows direct assignment, assign to worker +# 5. Create TaskAuditEvent (queue_changed, assigned) +``` + +### Start Verification + +Initiate external data verification for a verification task. + +```yaml +POST /processes/workflow/verification/start +x-actors: [caseworker, system] +x-capability: workflow + +requestBody: + taskId: uuid # VerificationTask to start + verificationSourceId: uuid # Which source to query + manualOverride: boolean # Skip automated check + +responses: + 200: + verificationTask: VerificationTask + 202: + verificationTask: VerificationTask + estimatedCompletion: datetime # For async verification + +# Orchestrates: +# 1. Validate VerificationSource is active +# 2. Update VerificationTask.status → awaiting_verification +# 3. If realtime_api source, call external API +# 4. If batch source, queue for batch processing +# 5. Create TaskAuditEvent (status_changed) +``` + +### Complete Verification + +Record verification result and resolve any discrepancies. + +```yaml +POST /processes/workflow/verification/complete +x-actors: [caseworker, system] +x-capability: workflow + +requestBody: + taskId: uuid # VerificationTask to complete + outcome: + - verified + - not_verified + - discrepancy_found + - waived + - pending_documentation + sourceResult: # If from external source + matchStatus: string + sourceValue: string + confidence: number + resolution: string # If discrepancy_found + resolutionNotes: string + documentIds: uuid[] # Supporting documents + +responses: + 200: + verificationTask: VerificationTask + discrepancyAlert: boolean # True if needs review + +# Orchestrates: +# 1. Update VerificationTask with outcome and resolution +# 2. If discrepancy requires review, escalate +# 3. Create TaskAuditEvent (completed) +# 4. Update EligibilityRequest if verification affects eligibility +# 5. If all verifications complete, trigger next workflow step +``` + +--- + +## Operational Metrics + +Domain-specific SLI metrics for monitoring workflow health and performance. For API-level metrics, see [API Architecture - Observability](../api-architecture.md#observability). + +### Task Metrics + +| Metric | Description | Labels | Target | +|--------|-------------|--------|--------| +| `task_completion_time_seconds` | Time from task creation to completion | taskType, programType, priority | p95 < SLA | +| `task_wait_time_seconds` | Time task spends unassigned in queue | queueId, programType | p95 < 4 hours | +| `tasks_in_queue` | Current tasks waiting in queue | queueId, programType, priority | Trend down | +| `tasks_by_status` | Current task count by status | status, programType | N/A | + +### SLA Metrics + +| Metric | Description | Labels | Target | +|--------|-------------|--------|--------| +| `sla_breach_rate` | Percentage of tasks that breach SLA | slaTypeCode, programType | < 5% | +| `sla_at_risk_count` | Tasks currently at risk of SLA breach | slaTypeCode, queueId | Alert threshold | +| `days_until_breach_distribution` | Distribution of days remaining before SLA breach | programType | Monitor trend | + +### Verification Metrics + +| Metric | Description | Labels | Target | +|--------|-------------|--------|--------| +| `verification_success_rate` | External verification API success rate | sourceId, sourceType | > 99% | +| `verification_latency_seconds` | Time to receive verification response | sourceId, integrationMethod | p95 < 10s | +| `verification_match_rate` | Rate of matches vs mismatches | sourceId, verificationType | Monitor trend | +| `verification_source_availability` | Availability of each verification source | sourceId | > 99.5% | + +### Assignment Metrics + +| Metric | Description | Labels | Target | +|--------|-------------|--------|--------| +| `assignment_count` | Tasks assigned per period | workerId, queueId | Balance across workers | +| `reassignment_rate` | Rate of tasks being reassigned | queueId, reason | < 10% | +| `escalation_rate` | Rate of tasks being escalated | escalationType, queueId | Monitor trend | + +### Alert Thresholds + +| Condition | Threshold | Action | +|-----------|-----------|--------| +| SLA breach imminent | > 10 tasks at risk in queue | Page supervisor | +| Verification source down | Availability < 95% for 5 min | Enable manual fallback | +| Queue depth spike | > 2x normal volume | Alert capacity planning | +| Worker overload | > 40 active tasks per worker | Rebalance assignments | diff --git a/docs/architecture/roadmap.md b/docs/architecture/roadmap.md new file mode 100644 index 0000000..2259c2b --- /dev/null +++ b/docs/architecture/roadmap.md @@ -0,0 +1,266 @@ +# Roadmap + +Migration plan, implementation phases, future considerations, and documentation gaps. + +See also: [Domain Design](domain-design.md) | [API Architecture](api-architecture.md) | [Design Decisions](design-decisions.md) + +--- + +## 1. Migration Considerations + +### Current Schema Mapping + +| Current Entity | Proposed Domain | Proposed Entity | Notes | +|----------------|-----------------|-----------------|-------| +| Person | Client Management | Client | Rename - "Client" indicates someone we serve | +| Household | Split | LivingArrangement (Client Mgmt/Intake) + EligibilityUnit (Eligibility) | Separate factual from regulatory | +| Application | Intake | Application | Move to Intake domain | +| HouseholdMember | Intake | Person | Simplify name, domain provides context | +| Income | Split | Income (Client Mgmt - stable) + Income (Intake - reported) | Split by stability | + +### New Entities Needed + +| Entity | Domain | Priority | +|--------|--------|----------| +| Task | Workflow | High | +| CaseWorker | Case Management | High | +| Supervisor | Case Management (extends CaseWorker) | High | +| Notice | Communication | High | +| Case | Case Management | Medium | +| EligibilityRequest | Eligibility | Medium | +| EligibilityUnit | Eligibility | Medium | +| Determination | Eligibility | Medium | +| LivingArrangement | Client Management / Intake | Medium | +| Appointment | Scheduling | Low | +| Document | Document Management | Low | + +--- + +## 2. Implementation Phases + +### Phase 1: Workflow & Case Management (Priority) +1. Create Workflow domain (Task, VerificationTask, TaskAuditEvent) +2. Create Case Management domain (CaseWorker, Assignment) +3. Create Communication entities (Notice) - cross-cutting, consumed by multiple domains + +### Phase 2: Domain Reorganization +1. Restructure existing schemas into domain folders +2. Create Client Management domain (rename Person → Client) +3. Create Intake domain (reorganize Application) +4. Create Eligibility domain (EligibilityUnit, EligibilityRequest, Determination) + +### Phase 3: Additional Domains +1. Create Scheduling domain +2. Create Document Management domain + +--- + +## 3. Future Considerations + +Potential domains and functionality not included in the current design, for future evaluation. + +### High Priority + +**Benefits/Issuance** +- Benefit amounts and calculations +- EBT card issuance and management +- Payment tracking +- Benefit history and adjustments + +*Rationale*: Core to safety net programs - what happens after eligibility is determined. Currently out of scope but essential for end-to-end benefits administration. + +**Appeals** +- Appeal requests +- Fair hearing scheduling +- Hearing outcomes and decisions +- Appeal workflow (distinct from standard eligibility workflow) + +*Rationale*: Required by law for all safety net programs. Has distinct workflow, timelines, and participants (hearing officers). Currently only represented as task types. + +### Medium Priority + +**Staffing Forecasting** +- Project task volume based on historical patterns and upcoming deadlines +- Calculate required staff hours vs current capacity +- Identify staffing gaps by office, queue, or program +- Potential entities: `StaffingForecast`, `DeadlineProjection` + +*Rationale*: Helps supervisors plan staffing during surges and avoid SLA breaches. Depends on mature Task and Caseload data to be useful. + +**Change Reporting** +- Mid-certification changes reported by clients +- Change processing and verification +- Impact assessment on current benefits +- Change-triggered recertifications + +*Rationale*: Common client interaction between certifications. Changes can affect eligibility and benefit amounts. Related to but distinct from Intake (not a new application). + +**Programs** +- Program definitions (SNAP, TANF, Medicaid, etc.) +- Eligibility rules and criteria +- Income/asset limits +- Deduction rules +- Program-specific configurations + +*Rationale*: Reference data needed across all domains. Currently assumed but not explicitly modeled. Could be configuration vs. a domain. + +### Low Priority + +**Fraud/Integrity** +- Fraud investigations +- Overpayment identification and tracking +- Recovery efforts +- Intentional Program Violations (IPVs) +- Disqualification periods + +*Rationale*: Important for program integrity but specialized function. Often handled by separate units with different workflows. + +**Referrals** +- Referrals to other services (employment, housing, childcare) +- Partner agency connections +- Community resource linking +- Referral tracking and outcomes + +*Rationale*: Valuable for holistic client support but secondary to core benefits administration. May vary significantly by state/agency. + +**Provider Management** +- Healthcare providers (Medicaid) +- SNAP authorized retailers +- TANF service providers +- Provider enrollment and verification + +*Rationale*: Program-specific (primarily Medicaid). Often managed by separate systems. Complex enough to be its own domain. + +**Quality Assurance** +- Case reviews and audits +- Error tracking and categorization +- Corrective action plans +- Federal reporting metrics (timeliness, accuracy) + +*Rationale*: Important for compliance but often aggregated from other domains. May be better as cross-cutting reporting than a separate domain. + +--- + +## 4. Documentation Gaps + +Topics identified but not yet fully documented or implemented. + +**Recently Addressed:** +- Performance specifics (caching TTLs, pagination limits, query complexity) → [api-architecture.md](api-architecture.md#performance) +- Circuit breaker pattern → [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#circuit-breakers) +- Data classification annotations → [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml#data-classification) +- Domain-specific SLI metrics → [workflow.md](domains/workflow.md#operational-metrics) +- Quality attributes summary → [api-architecture.md](api-architecture.md#4-quality-attributes-summary) + +### Added to api-patterns.yaml (Not Yet Implemented) + +The following patterns have been added to [api-patterns.yaml](../../packages/schemas/openapi/patterns/api-patterns.yaml) but require implementation: + +| Pattern | Description | Implementation Required | +|---------|-------------|------------------------| +| **Error Handling** | Standard error response structure, error codes, HTTP status guidance | Update `common-responses.yaml` schemas, mock server error responses | +| **API Versioning** | URL-path versioning strategy, deprecation headers | Update OpenAPI specs with version prefix, mock server routing | +| **Idempotency** | `Idempotency-Key` header for safe retries | Mock server must track keys and return stored responses | +| **Batch Operations** | `POST /{resources}/batch` pattern | Add batch endpoints to specs, mock server batch handling | +| **Authentication** | OAuth 2.0/OIDC, API keys, mTLS options; state-configurable IdP | OpenAPI security schemes, mock server token validation | +| **Authorization** | Scopes, RBAC roles, ABAC rules, field-level filtering | Middleware for scope/role checks, response filtering | +| **Rate Limiting** | Request limits with standard headers | API gateway configuration, mock server rate limit simulation | +| **Security Headers** | HSTS, CORS, cache control | API gateway or middleware configuration | +| **Audit Logging** | Required fields, sensitive access logging, PII handling | Logging infrastructure, correlation ID propagation | +| **Process API Metadata** | `x-actors`, `x-capability` extensions | Validation rules, documentation generation | +| **Correlation IDs** | `X-Correlation-ID` header for request tracing | Header propagation, logging integration | +| **ETags / Optimistic Concurrency** | `If-Match`, `If-None-Match` for conflict prevention | ETag generation, conditional request handling | +| **Sorting** | `sort` query parameter for list endpoints | Add to list endpoint specs, mock server support | +| **Long-Running Operations** | Async pattern with operation status polling | Operation status endpoints, background job infrastructure | + +Each section in `api-patterns.yaml` is marked with `# STATUS: Not yet implemented` to indicate work remaining. + +**Note on state configurability:** Authentication and authorization patterns define the interface contract, not specific provider implementations. States configure their own identity providers (Okta, Azure AD, state-specific IdP) and may customize role definitions while maintaining interoperability. + +### API Patterns to Consider + +Additional patterns that may be valuable depending on implementation needs. These are not commitments—evaluate each based on actual requirements. + +| Pattern | Description | Consider When | +|---------|-------------|---------------| +| **Webhooks / Event Subscriptions** | Subscribe to events, delivery guarantees | Event-driven integration is needed | +| **Partial Responses / Field Selection** | `?fields=id,name` to reduce payload size | Mobile or bandwidth-constrained clients emerge | +| **Advanced Caching** | `Cache-Control` directives, `Vary` headers | Caching strategy is defined | +| **Hypermedia / HATEOAS** | `_links` in responses for discoverability | API discovery becomes important | +| **Content Negotiation** | `Accept` header handling, multiple formats | Non-JSON formats are needed | +| **Health Check Details** | Detailed `/health` and `/ready` patterns | Observability standards are finalized | + +### Needs Architecture Documentation + +**Data Retention & Archival** + +| Data Type | Active Retention | Archive | Purge | +|-----------|------------------|---------|-------| +| Applications | 7 years after closure | Cold storage | Per state policy | +| Audit logs | 7 years | Immutable archive | Never (compliance) | +| PII | Per program requirements | Encrypted archive | On request + retention period | +| Session/tokens | 24 hours | N/A | Immediate | + +*Considerations*: +- Federal programs have specific retention requirements +- Right to deletion must balance against audit requirements +- Archived data must remain queryable for audits + +*Compliance Cross-References*: + +| Program | Regulation | Requirement | +|---------|------------|-------------| +| SNAP | 7 CFR 272.1 | Record retention requirements | +| Medicaid | 42 CFR 431.17 | Records and reports | +| TANF | 45 CFR 265.2 | Data collection and reporting | +| All | HIPAA | Protected health information (Medicaid) | +| All | FERPA | Education records (when used for eligibility) | + +*See also*: [API Architecture - Compliance](api-architecture.md#compliance) for field-level handling and right-to-deletion process. + +**Event-Driven Architecture / Webhooks** + +For external system integration without polling. + +| Event | Trigger | Typical Consumers | +|-------|---------|-------------------| +| `application.submitted` | New application received | Document management, eligibility engine | +| `determination.completed` | Eligibility decided | Notice generation, benefits issuance | +| `task.sla_warning` | Task approaching deadline | Supervisor dashboards, alerting | +| `task.assigned` | Task assignment changed | Caseworker notifications | + +*Pattern*: +- Events published to message broker (not direct HTTP calls) +- Webhook subscriptions for external consumers +- At-least-once delivery with idempotent consumers +- Event schema versioning aligned with API versioning + +**Integration Patterns** + +How legacy systems and external services connect. + +| Pattern | Use Case | Example | +|---------|----------|---------| +| API Gateway | All external access | Authentication, rate limiting, routing | +| Adapter | Vendor system integration | Workflow vendor → System API translation | +| Anti-corruption layer | Legacy system integration | Mainframe → modern API translation | +| Event bridge | Async integration | Real-time updates to data warehouse | +| Batch file | Legacy batch systems | Nightly SSA data exchange | + +### Separate Documents (Future) + +**Testing Strategy** + +Warrants its own document covering: +- Contract testing (Process APIs against System API contracts) +- Mock server usage patterns +- Integration test data management +- Performance/load testing approach + +**State Security Implementation Guide** + +The security patterns in `api-patterns.yaml` define the interface contract. A separate guide may be needed for states covering: +- Identity provider setup (Okta, Azure AD, state IdP) +- Role mapping to state organizational structure +- Break-glass procedures and emergency access +- Compliance documentation (FedRAMP, StateRAMP, etc.) diff --git a/packages/schemas/openapi/patterns/api-patterns.yaml b/packages/schemas/openapi/patterns/api-patterns.yaml index 1ac98b4..551d3da 100644 --- a/packages/schemas/openapi/patterns/api-patterns.yaml +++ b/packages/schemas/openapi/patterns/api-patterns.yaml @@ -5,11 +5,32 @@ version: "1.0" # ============================================================================= -# Naming Conventions +# Pattern Applicability +# ============================================================================= +# This project has two API layers (see docs/architecture/domain-design.md): +# +# SYSTEM APIs: RESTful CRUD access to domain data +# - Example: GET /tasks/{id}, POST /applications, PATCH /cases/{id} +# - Located in: openapi/domains/ +# +# PROCESS APIs: RPC-style orchestration of business operations +# - Pattern: POST /processes/{domain}/{resource}/{action} +# - Example: POST /processes/workflow/tasks/claim +# - Located in: openapi/processes/{domain}/{resource}/{action}.yaml +# +# Each section below indicates which API type it applies to: +# [SYSTEM] - System APIs only +# [PROCESS] - Process APIs only +# [BOTH] - Both API types +# [SYSTEM, PROCESS*] - Required for System, optional/varies for Process + +# ============================================================================= +# Naming Conventions [BOTH] # ============================================================================= naming: # URL path segments: kebab-case (lowercase with hyphens) - # Example: /user-profiles, /order-items + # System API example: /user-profiles, /order-items + # Process API example: /processes/applications/submit paths: kebab-case # Path parameters: camelCase inside braces @@ -21,11 +42,13 @@ naming: query_parameters: camelCase # Operation IDs: camelCase verb + noun - # Example: listPersons, createApplication, getHouseholdById + # System API example: listPersons, createApplication, getHouseholdById + # Process API example: submitApplication, determineEligibility, reassignTasks operation_ids: camelCase # Schema/component names: PascalCase - # Example: Person, ApplicationCreate, HouseholdList + # System API example: Person, ApplicationCreate, HouseholdList (resource schemas) + # Process API example: SubmitApplicationRequest, SubmitApplicationResponse (DTOs) schemas: PascalCase # File names: kebab-case @@ -33,8 +56,9 @@ naming: files: kebab-case # ============================================================================= -# List Endpoint Pattern +# List Endpoint Pattern [SYSTEM] # ============================================================================= +# Process APIs do not have list endpoints - they perform actions, not queries. list_endpoints: # All list endpoints (GET /resources) must include these parameters required_parameters: @@ -73,8 +97,9 @@ list_endpoints: response_schema_suffix: List # ============================================================================= -# Search Query Syntax +# Search Query Syntax [SYSTEM] # ============================================================================= +# Process APIs do not use search - they receive explicit request payloads. search_syntax: parameter_name: q description: | @@ -114,8 +139,11 @@ search_syntax: example: "address.state:CA" # ============================================================================= -# CRUD Operations +# CRUD Operations [SYSTEM] # ============================================================================= +# Process APIs are RPC-style (POST with action verbs), not RESTful CRUD. +# Process API pattern: POST /processes/{domain}/{resource}/{action} +# Example: POST /processes/workflow/tasks/claim, POST /processes/case-management/workers/assign crud_operations: # CREATE - POST /resources create: @@ -181,8 +209,10 @@ crud_operations: error_responses: [404, 500] # ============================================================================= -# Shared Components +# Shared Components [BOTH] # ============================================================================= +# Error responses are shared. Query parameters are System API only. +# Process APIs define their own request/response DTOs. shared_components: # Error responses - use $ref to these error_responses: @@ -225,8 +255,11 @@ shared_components: maxLength: 320 # ============================================================================= -# Schema Patterns +# Schema Patterns [SYSTEM, PROCESS*] # ============================================================================= +# System APIs: Use resource schemas with standard base fields (id, createdAt, updatedAt) +# Process APIs: Use purpose-built DTOs (Request/Response suffixes) - may reference +# System API schemas but define their own request/response shapes. schema_patterns: # Every resource should have these standard fields resource_base_fields: @@ -289,8 +322,9 @@ schema_patterns: suffix: List # ============================================================================= -# File Structure +# File Structure [BOTH] # ============================================================================= +# System APIs in openapi/domains/, Process APIs in openapi/processes/ file_structure: # Main API specs go in openapi/ root api_specs: @@ -326,8 +360,122 @@ file_structure: postman_collection: "generated/postman-collection.json" # ============================================================================= -# Validation Rules +# Process API Metadata [PROCESS] +# ============================================================================= +# STATUS: Not yet implemented in validation +# Process APIs use OpenAPI extension fields to declare metadata about +# who can call them and what capability they provide. +process_api_metadata: + description: | + Process APIs are organized by domain, then resource, then action: + + /processes/{domain}/{resource}/{action} + + Examples: + /processes/workflow/tasks/claim + /processes/workflow/verification/start + /processes/case-management/workers/assign + /processes/communication/notices/send + + This enables documentation generation, authorization rules, and API discovery. + + path_convention: + pattern: "/processes/{domain}/{resource}/{action}" + components: + domain: The owning domain (workflow, case-management, intake, eligibility, communication, etc.) + resource: The resource being acted upon (tasks, workers, cases, notices, etc.) + action: The operation being performed (claim, assign, transfer, send, etc.) + convention: | + When an operation involves multiple resources, place it under the resource + being acted upon (not the primary output). This matches natural language + and improves discoverability. + examples: + - path: /processes/workflow/tasks/claim + description: Claim a task (task is acted upon) + - path: /processes/case-management/workers/assign + description: Assign a worker (worker is acted upon) + - path: /processes/case-management/cases/transfer + description: Transfer a case (case is acted upon) + - path: /processes/communication/notices/send + description: Send a notice (notice is acted upon) + + extensions: + x-actors: + description: Which actors/roles can call this endpoint + type: array + location: Operation level (under paths/{path}/{method}) + values: + - client # Person applying for/receiving benefits + - authorized_rep # Representative acting on behalf of client + - caseworker # Staff processing applications + - supervisor # Team lead with approval authority + - system # Automated/batch processes + example: | + paths: + /processes/intake/applications/submit: + post: + x-actors: [client, caseworker, authorized_rep] + summary: Submit an application + + x-capability: + description: The business capability this endpoint provides + type: string + location: Operation level (under paths/{path}/{method}) + purpose: | + Groups related Process API endpoints for documentation and discovery. + Should match the domain in the Process API path for consistency. + values: + # Core domains + - client-management # Managing client records and relationships + - intake # Submitting and managing applications + - eligibility # Determining eligibility + - workflow # Tasks, verification, SLA tracking + - case-management # Cases, assignments, workload + - scheduling # Appointments and interviews + - document-management # Uploading and managing documents + # Cross-cutting + - communication # Notices and correspondence + - reporting # Generating reports + - configuration # Managing system configuration + note: | + The x-capability should match the {domain} segment in the Process API path. + Example: /processes/workflow/tasks/claim uses x-capability: workflow + example: | + paths: + /processes/intake/applications/submit: + post: + x-capability: intake + x-actors: [client, caseworker] + + x-idempotent: + description: Whether this operation is idempotent (safe to retry) + type: boolean + location: Operation level + default: false for POST + note: All Process API POST operations should specify this explicitly + example: | + paths: + /processes/intake/applications/submit: + post: + x-idempotent: true # Safe to retry with same Idempotency-Key + + usage_in_authorization: | + The x-actors extension can be used by API gateways or middleware to + enforce role-based access control: + 1. Extract user's role from JWT token + 2. Check if role is in x-actors array for the endpoint + 3. Return 403 Forbidden if not authorized + + usage_in_documentation: | + Documentation generators can use these extensions to: + - Group endpoints by x-capability + - Show which roles can access each endpoint + - Generate role-specific API documentation + +# ============================================================================= +# Validation Rules [BOTH] # ============================================================================= +# Syntax and lint rules apply to both. Pattern checks differ by API type. validation: # These are enforced by npm run validate layers: @@ -350,3 +498,979 @@ validation: - List responses have required properties - Error responses use shared $refs - CRUD operations follow patterns + +# ============================================================================= +# Error Handling [BOTH] +# ============================================================================= +# STATUS: Not yet implemented in mock server or validation +error_handling: + # Standard error response body structure + response_body: + required_fields: + - name: code + type: string + description: Machine-readable error code (e.g., "VALIDATION_ERROR", "RESOURCE_NOT_FOUND") + - name: message + type: string + description: Human-readable error description + optional_fields: + - name: details + type: array + description: Array of specific field-level errors + items: + field: Path to the field with error + code: Field-specific error code + message: Field-specific error message + - name: retryable + type: boolean + description: Whether the client should retry the request + - name: retryAfter + type: integer + description: Seconds to wait before retry (for rate limiting) + + # Error code taxonomy + error_codes: + validation: + - MISSING_REQUIRED_FIELD + - INVALID_FORMAT + - INVALID_VALUE + - VALUE_OUT_OF_RANGE + - FIELD_TOO_LONG + business_rules: + - DUPLICATE_RESOURCE + - INVALID_STATE_TRANSITION + - PREREQUISITE_NOT_MET + - RESOURCE_LOCKED + - SLA_CONSTRAINT_VIOLATED + authorization: + - UNAUTHORIZED + - FORBIDDEN + - INSUFFICIENT_PERMISSIONS + system: + - SERVICE_UNAVAILABLE + - UPSTREAM_TIMEOUT + - INTERNAL_ERROR + + # HTTP status code guidance + http_status_guidance: + 400: + description: Malformed request (can't parse JSON, missing required query params) + retryable: false + 401: + description: Authentication required or invalid credentials + retryable: false + 403: + description: Authenticated but not authorized for this resource/action + retryable: false + 404: + description: Resource not found + retryable: false + 409: + description: Conflict (duplicate resource, concurrent modification) + retryable: true + note: Client should fetch current state and retry + 422: + description: Valid JSON but fails business validation + retryable: false + 429: + description: Rate limited + retryable: true + note: Use retryAfter header + 500: + description: Unexpected server error + retryable: true + 503: + description: Service temporarily unavailable + retryable: true + +# ============================================================================= +# API Versioning [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +versioning: + strategy: url-path + description: Version in URL path (e.g., /v1/tasks, /v2/tasks) + + current_version: v1 + supported_versions: + - v1 + + compatibility_rules: + backward_compatible: + - Adding new optional request fields + - Adding new response fields + - Adding new endpoints + - Adding new enum values (clients must ignore unknown values) + - Adding new optional query parameters + breaking_changes: + - Removing fields + - Changing field types + - Renaming fields + - Removing enum values + - Changing URL paths + - Making optional fields required + + deprecation: + notice_period_months: 6 + headers: + - name: Deprecation + value: "true" + description: Indicates endpoint is deprecated + - name: Sunset + value: "" + description: Date when endpoint will be removed + - name: Link + value: "; rel=\"successor-version\"" + description: Link to replacement endpoint/documentation + +# ============================================================================= +# Idempotency [PROCESS required, SYSTEM for POST only] +# ============================================================================= +# STATUS: Not yet implemented in mock server +# Process APIs: Required for all POST operations (actions may have side effects) +# System APIs: Required for POST (create) operations only +idempotency: + description: | + Ensures that retrying a request (due to network issues, timeouts, etc.) + does not cause duplicate side effects. Critical for Process APIs. + + applies_to: + process_apis: All POST operations (required) + system_apis: POST operations that create resources + + mechanism: + header_name: Idempotency-Key + format: Client-generated UUID v4 + required: true for applicable operations + behavior: | + 1. Client generates unique key and includes in request header + 2. Server checks if key has been seen within retention window + 3. If seen: return stored response (do not re-execute) + 4. If new: execute operation, store response with key + retention_hours: 24 + note: Keys are scoped per client/tenant + + response_headers: + - name: Idempotent-Replayed + values: ["true", "false"] + description: Indicates whether this response is a replay of a previous request + + error_cases: + - code: IDEMPOTENCY_KEY_REUSED + status: 422 + description: Same key used with different request body + - code: IDEMPOTENCY_KEY_MISSING + status: 400 + description: Required idempotency key not provided + +# ============================================================================= +# Batch Operations [SYSTEM primarily, PROCESS rarely] +# ============================================================================= +# STATUS: Not yet implemented +# System APIs: Common for bulk CRUD (e.g., PATCH /tasks/batch) +# Process APIs: Rare; most process operations are single-action. If needed, +# the process itself handles multiple items (e.g., bulk-reassign). +batch_operations: + description: | + For operations that need to affect multiple resources in a single request. + Useful for bulk updates, imports, and administrative operations. + Primarily for System APIs; Process APIs typically handle batching internally. + + endpoint_pattern: "/{resources}/batch" + method: POST + + request_body: + structure: + operations: + type: array + max_items: 100 + items: + action: + type: string + enum: [create, update, delete] + id: + type: string + description: Required for update/delete, omit for create + data: + type: object + description: Resource payload (for create/update) + + response: + structure: + total: Total operations requested + succeeded: Count of successful operations + failed: Count of failed operations + results: + type: array + items: + index: Position in request array + action: The action attempted + status: "succeeded" | "failed" + id: Resource ID (for successful creates) + error: Error details (for failures) + + behavior: + atomicity: false + description: | + Batch operations use partial success model - individual operations + may succeed or fail independently. Client must check results array + to determine outcome of each operation. + ordering: Operations processed sequentially in array order + rate_limiting: Batch counts as N requests toward rate limit + + idempotency: + description: Batch operations require idempotency key + scope: Entire batch (not individual operations) + +# ============================================================================= +# Authentication [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +# NOTE: States will use their own identity providers. This section defines the +# interface contract, not the specific provider implementation. +authentication: + description: | + How clients prove their identity. The API layer validates tokens but does + not implement the identity provider - that's state-specific infrastructure. + + # Supported authentication methods (state chooses which to implement) + supported_methods: + oauth2_oidc: + description: OAuth 2.0 with OpenID Connect (recommended for user-facing apps) + token_type: Bearer + header: "Authorization: Bearer " + token_format: JWT (recommended) or opaque + discovery: "/.well-known/openid-configuration" + note: State configures their IdP (Okta, Azure AD, Auth0, state-specific) + + api_key: + description: API keys for server-to-server integration + header: "X-API-Key: " + use_cases: + - Batch jobs + - Legacy system integration + - Internal service-to-service + note: Simpler but less secure; use OAuth for user context + + mtls: + description: Mutual TLS with client certificates + use_cases: + - High-security integrations + - Federal system connections + note: Certificate management is state responsibility + + # OpenAPI security scheme definitions (for spec files) + openapi_security_schemes: + bearerAuth: + type: http + scheme: bearer + bearerFormat: JWT + description: OAuth 2.0 Bearer token + + apiKeyAuth: + type: apiKey + in: header + name: X-API-Key + description: API key for server-to-server calls + + # Token validation requirements + token_validation: + required_claims: + - sub: Subject (user or service identifier) + - iat: Issued at timestamp + - exp: Expiration timestamp + recommended_claims: + - iss: Issuer (IdP identifier) + - aud: Audience (this API) + - scope: OAuth scopes granted + - roles: User roles (for RBAC) + - tenant_id: State/county identifier (if multi-tenant) + + # State configuration guidance + state_configuration: + description: | + Each state configures their identity provider. The API layer needs: + - JWKS endpoint (for JWT validation) + - Issuer URL (for token verification) + - Audience value (this API's identifier) + - Clock skew tolerance + environment_variables: + - AUTH_JWKS_URI: "https://idp.example.gov/.well-known/jwks.json" + - AUTH_ISSUER: "https://idp.example.gov" + - AUTH_AUDIENCE: "safety-net-api" + - AUTH_CLOCK_SKEW_SECONDS: 60 + +# ============================================================================= +# Authorization [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +# NOTE: Authorization rules may vary by state. This defines the patterns; +# specific role definitions are state-configurable. +authorization: + description: | + What authenticated users can do. Combines role-based access control (RBAC) + with attribute-based rules (ABAC) for fine-grained permissions. + + # OAuth scopes (coarse-grained API access) + scopes: + description: OAuth scopes control broad API access categories + standard_scopes: + - name: "applications:read" + description: Read applications + - name: "applications:write" + description: Create/update applications + - name: "tasks:read" + description: Read tasks + - name: "tasks:write" + description: Update task status, assignments + - name: "cases:read" + description: Read case information + - name: "cases:write" + description: Update cases + - name: "config:read" + description: Read configuration + - name: "config:admin" + description: Modify configuration (admin only) + - name: "audit:read" + description: Read audit logs + + # Role-based access control + rbac: + description: | + Roles provide bundles of permissions. Roles are state-configurable but + should follow this general hierarchy. + standard_roles: + - name: client + description: Person applying for/receiving benefits + typical_scopes: [applications:read, applications:write] + constraints: Own data only + + - name: authorized_representative + description: Representative acting on behalf of clients + typical_scopes: [applications:read, applications:write] + constraints: Delegated clients only + + - name: caseworker + description: Staff processing applications and tasks + typical_scopes: [applications:read, tasks:read, tasks:write, cases:read] + constraints: Assigned work only + + - name: supervisor + description: Team lead with approval authority + typical_scopes: [applications:read, tasks:read, tasks:write, cases:read, cases:write] + constraints: Team's work, can reassign + + - name: office_manager + description: Office-level administration + typical_scopes: [applications:read, tasks:read, cases:read, config:read] + constraints: Office scope + + - name: program_admin + description: Program configuration and reporting + typical_scopes: [config:read, config:admin, audit:read] + constraints: No case-level PII access + + - name: auditor + description: Read-only access for compliance review + typical_scopes: [applications:read, tasks:read, cases:read, audit:read] + constraints: Read-only, all data + + state_customization: | + States may define additional roles or modify scope assignments. + Role names should remain consistent for interoperability. + + # Attribute-based access control + abac: + description: | + Fine-grained rules based on resource and user attributes. + Evaluated after RBAC role check passes. + example_rules: + - rule: "Caseworker can only access assigned cases" + condition: "resource.assignedToId == user.id" + + - rule: "Supervisor can access team's cases" + condition: "resource.assignedToId IN user.teamMemberIds" + + - rule: "Office manager can access office's cases" + condition: "resource.officeId == user.officeId" + + - rule: "Can only view cases for programs user is certified for" + condition: "resource.programType IN user.certifiedPrograms" + + # Field-level authorization + field_level: + description: | + Some fields require additional authorization beyond endpoint access. + Same endpoint may return different fields based on role. + sensitive_fields: + - field: socialSecurityNumber + full_access: [auditor, assigned_caseworker_during_verification] + masked_access: [supervisor, office_manager] # Last 4 only + no_access: [client] # Cannot see own SSN in API response + + - field: internalNotes + full_access: [caseworker, supervisor] + no_access: [client, authorized_representative] + + implementation: | + Use response filtering at the API layer. Do not rely on clients + to hide fields - the API must not return unauthorized data. + +# ============================================================================= +# Rate Limiting [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +rate_limiting: + description: | + Protects the API from abuse and ensures fair resource allocation. + Typically implemented at API gateway level. + + # Standard headers + headers: + - name: X-RateLimit-Limit + description: Maximum requests allowed in window + - name: X-RateLimit-Remaining + description: Requests remaining in current window + - name: X-RateLimit-Reset + description: Unix timestamp when window resets + - name: Retry-After + description: Seconds to wait before retrying (on 429 response) + + # Rate limit tiers (state-configurable) + default_tiers: + - tier: standard + requests_per_minute: 60 + requests_per_hour: 1000 + applies_to: Default for authenticated users + + - tier: elevated + requests_per_minute: 300 + requests_per_hour: 5000 + applies_to: Trusted internal services + + - tier: batch + requests_per_minute: 10 + requests_per_hour: 100 + note: Batch endpoints; each batch counts as N requests + + # Response when rate limited + rate_limit_response: + status: 429 + body: + code: RATE_LIMIT_EXCEEDED + message: Too many requests + retryAfter: + +# ============================================================================= +# Security Headers [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +security_headers: + description: Standard HTTP security headers for all responses + + required_headers: + - name: Strict-Transport-Security + value: "max-age=31536000; includeSubDomains" + description: Enforce HTTPS + + - name: X-Content-Type-Options + value: "nosniff" + description: Prevent MIME type sniffing + + - name: X-Frame-Options + value: "DENY" + description: Prevent clickjacking + + - name: Cache-Control + value: "no-store" + description: Prevent caching of sensitive data + note: May be relaxed for public/static endpoints + + cors: + description: Cross-Origin Resource Sharing (state-configurable) + configuration: + allowed_origins: State configures allowed frontend domains + allowed_methods: [GET, POST, PATCH, DELETE, OPTIONS] + allowed_headers: [Authorization, Content-Type, X-API-Key, Idempotency-Key] + expose_headers: [X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset] + max_age: 86400 + +# ============================================================================= +# Audit Logging [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +audit_logging: + description: | + All API access must be logged for compliance and security monitoring. + Logs must NOT contain PII - use correlation IDs to link to secure storage. + + required_fields: + - timestamp: ISO 8601 format + - correlation_id: Request correlation ID + - user_id: Authenticated user identifier + - user_role: Role used for this request + - client_ip: Originating IP address + - method: HTTP method + - path: Request path (without query params) + - resource_type: Type of resource accessed + - resource_id: ID of resource accessed (if applicable) + - action: Logical action (create, read, update, delete) + - status_code: HTTP response status + - response_time_ms: Request duration + + sensitive_access_logging: + description: Additional logging for sensitive data access + triggers: + - SSN viewed (even read access) + - PII exported + - Bulk data access + - Configuration changes + - Failed authorization attempts + additional_fields: + - fields_accessed: Which sensitive fields were returned + - business_justification: If provided by user + - supervisor_notified: For break-glass access + + pii_handling: + description: Never log PII directly + guidance: + - Log resource IDs, not resource content + - Log field names accessed, not field values + - Use correlation ID to link to audit detail in secure storage + - Mask any PII that must appear in logs (e.g., last 4 of SSN) + +# ============================================================================= +# Correlation IDs [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +correlation_ids: + description: | + Unique identifier that follows a request through all services and logs. + Essential for distributed tracing and debugging. + + header: + name: X-Correlation-ID + format: UUID v4 + behavior: | + 1. If client provides X-Correlation-ID, use it + 2. If not provided, generate a new UUID + 3. Include in all downstream service calls + 4. Include in all log entries + 5. Return in response header + + response_header: + name: X-Correlation-ID + description: Echo back the correlation ID used for this request + + logging_integration: + description: All log entries must include correlationId field + example: | + { + "timestamp": "2024-01-15T10:30:00Z", + "level": "info", + "correlationId": "550e8400-e29b-41d4-a716-446655440000", + "service": "task-api", + "message": "Task created", + "taskId": "abc-123" + } + + propagation: + description: Pass correlation ID to all downstream calls + guidance: + - Include X-Correlation-ID header in HTTP calls to other services + - Include in message queue message headers + - Include in async job metadata + +# ============================================================================= +# ETags and Optimistic Concurrency [SYSTEM] +# ============================================================================= +# STATUS: Not yet implemented +etags: + description: | + ETags enable optimistic concurrency control (prevent lost updates) and + conditional requests (caching). System APIs should support ETags for + resources that can be updated. + + response_headers: + ETag: + description: Opaque identifier for the current resource version + format: Quoted string (e.g., "abc123" or W/"abc123" for weak) + generation: Hash of resource content or version number + + request_headers: + If-Match: + description: Update only if ETag matches (optimistic locking) + use_case: Prevent lost updates when multiple users edit same resource + behavior: | + 1. Client GETs resource, receives ETag + 2. Client PATCHes with If-Match: "etag-value" + 3. If ETag matches current: apply update, return new ETag + 4. If ETag doesn't match: return 412 Precondition Failed + + If-None-Match: + description: Return resource only if ETag differs (caching) + use_case: Avoid re-downloading unchanged resources + behavior: | + 1. Client GETs resource, caches response with ETag + 2. Client GETs again with If-None-Match: "etag-value" + 3. If ETag matches: return 304 Not Modified (no body) + 4. If ETag differs: return 200 with new resource and ETag + + error_responses: + 412: + code: PRECONDITION_FAILED + description: If-Match ETag doesn't match current version + message: Resource has been modified by another request + guidance: Client should re-fetch and retry or notify user of conflict + + 304: + description: Not Modified - resource unchanged since ETag + body: Empty (client uses cached version) + + implementation_notes: + - ETags should be opaque to clients (don't expose version numbers directly) + - Use strong ETags for byte-for-byte comparison + - Use weak ETags (W/"...") for semantic equivalence + - Consider ETag scope: entire resource or specific fields + +# ============================================================================= +# Sorting [SYSTEM] +# ============================================================================= +# STATUS: Not yet implemented +sorting: + description: | + Standard query parameter for sorting list endpoint results. + Sorting is typically used with System APIs that return collections. + + parameter: + name: sort + location: query + type: string + description: | + Comma-separated list of fields to sort by. Prefix with - for descending. + First field is primary sort, subsequent fields are secondary sorts. + + syntax: + ascending: "fieldName" + descending: "-fieldName" + multiple: "field1,-field2,field3" + + examples: + - sort=createdAt # Oldest first + - sort=-createdAt # Newest first + - sort=status,-priority # By status (asc), then priority (desc) + - sort=-dueDate,createdAt # By due date (desc), then created (asc) + + default_behavior: + description: If no sort specified, use sensible default per resource + typical_defaults: + - List endpoints: -createdAt (newest first) + - Search results: relevance score (if applicable) + + error_handling: + invalid_field: + status: 400 + code: INVALID_SORT_FIELD + message: "Unknown sort field: {field}" + unsortable_field: + status: 400 + code: FIELD_NOT_SORTABLE + message: "Field '{field}' does not support sorting" + + implementation_notes: + - Document sortable fields in OpenAPI spec + - Consider index implications for sortable fields + - Limit number of sort fields (e.g., max 3) + +# ============================================================================= +# Long-Running Operations [PROCESS primarily] +# ============================================================================= +# STATUS: Not yet implemented +long_running_operations: + description: | + Pattern for operations that take too long for a synchronous response. + Common in Process APIs for complex business operations (e.g., batch + eligibility determination, bulk document processing). + + threshold: + description: Operations expected to take >30 seconds should be async + note: Exact threshold depends on client requirements and infrastructure + + pattern: + name: Polling with Operation Resource + flow: | + 1. Client POSTs to initiate operation + 2. Server returns 202 Accepted with operation ID and status URL + 3. Client polls status URL until complete + 4. When complete, status includes result or result URL + + initiation: + request: + method: POST + path: "/processes/{domain}/{resource}/{action}" + body: Operation parameters + headers: + Idempotency-Key: Required (operations are idempotent) + + response: + status: 202 Accepted + headers: + Location: URL to poll for status + body: + operationId: UUID + status: "pending" | "running" + statusUrl: "/operations/{operationId}" + estimatedCompletionTime: ISO 8601 datetime (optional) + + status_polling: + request: + method: GET + path: "/operations/{operationId}" + + response_pending: + status: 200 + body: + operationId: UUID + status: "pending" | "running" + progress: 0-100 (optional) + message: Human-readable status (optional) + + response_completed: + status: 200 + body: + operationId: UUID + status: "completed" + completedAt: ISO 8601 datetime + result: Inline result (if small) + resultUrl: URL to fetch result (if large) + + response_failed: + status: 200 + body: + operationId: UUID + status: "failed" + error: + code: Error code + message: Error description + failedAt: ISO 8601 datetime + + cancellation: + description: Allow clients to cancel long-running operations + request: + method: DELETE + path: "/operations/{operationId}" + response: + status: 200 (cancelled) or 409 (cannot cancel - already complete) + + polling_guidance: + initial_delay: 1 second + max_delay: 30 seconds + backoff: Exponential with jitter + note: Include Retry-After header in status responses + + alternative_patterns: + webhooks: + description: Server calls client webhook when complete + trade_off: More complex but avoids polling overhead + server_sent_events: + description: Client maintains connection for real-time updates + trade_off: More complex, connection management issues + +# ============================================================================= +# Circuit Breakers [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +circuit_breakers: + description: | + Protect the system when external dependencies fail. Circuit breakers prevent + cascade failures by failing fast when a dependency is unavailable. + + when_to_use: + - External verification sources (IRS, SSA, state databases) + - Vendor system adapters (workflow tools, case management systems) + - Notice delivery services (email, SMS, postal) + - Any external API with potential for latency or availability issues + + states: + closed: + description: Normal operation - requests pass through + behavior: Track failure rate + open: + description: Dependency is failing - fail fast without calling + behavior: Return fallback response or error immediately + half_open: + description: Testing if dependency has recovered + behavior: Allow limited requests through to test + + configuration: + failure_threshold: + description: Number/percentage of failures before opening circuit + recommended: 5 failures or 50% error rate in 30 seconds + reset_timeout: + description: Time to wait before trying half-open + recommended: 30 seconds + half_open_requests: + description: Number of test requests in half-open state + recommended: 3 requests + timeout: + description: Request timeout before counting as failure + recommended: 10 seconds for external APIs + + fallback_strategies: + verification_sources: + - strategy: Manual fallback + description: Route to manual verification queue + example: "If IRS unavailable, create manual verification task" + - strategy: Cached data + description: Use last known good data with staleness indicator + example: "Use cached wage data from previous verification" + - strategy: Graceful degradation + description: Continue without this data source + example: "Skip optional verification, flag for later" + + vendor_adapters: + - strategy: Queue for retry + description: Store operation for later replay + example: "Queue task creation for retry when vendor recovers" + - strategy: Local fallback + description: Use local implementation temporarily + example: "Use mock database until vendor system recovers" + + monitoring: + metrics: + - circuit_state: Current state (closed/open/half_open) + - circuit_failures: Failure count that triggered open + - circuit_success_rate: Success rate after recovery + alerts: + - Circuit opened (dependency down) + - Circuit stuck open for > 5 minutes + - High failure rate approaching threshold + + openapi_extension: + name: x-circuit-breaker + location: Operation level + example: | + paths: + /processes/workflow/verification/start: + post: + x-circuit-breaker: + enabled: true + fallback: manual_verification + timeout_ms: 10000 + +# ============================================================================= +# Data Classification [BOTH] +# ============================================================================= +# STATUS: Not yet implemented +data_classification: + description: | + All API fields should be classified for appropriate handling. Classification + drives encryption, logging, access control, and retention policies. + + classifications: + pii: + description: Personally Identifiable Information - can identify an individual + handling: + - Encrypt at rest + - Mask in logs (show last 4 digits max) + - Log all access for audit + - Exclude from search indexes unless necessary + - Subject to right-to-deletion requests + retention: Per program requirements, typically 7 years after case closure + + sensitive: + description: Sensitive data that is not PII but requires protection + handling: + - Encrypt at rest + - Exclude from general logs + - Restrict access by role + - May appear in audit logs + retention: Same as PII + + internal: + description: Internal operational data + handling: + - Standard encryption at rest + - May appear in logs + - Standard access controls + retention: Per operational needs + + public: + description: Non-sensitive reference data + handling: + - Standard encryption at rest + - May be cached + - May appear in logs + - No special access restrictions + retention: Indefinite for reference data + + pii_fields: + description: Fields that should be classified as PII + fields: + - socialSecurityNumber + - dateOfBirth + - firstName + - lastName + - middleName + - maidenName + - address (all address fields) + - phoneNumber + - email + - driversLicenseNumber + - passportNumber + - alienNumber + - bankAccountNumber + - routingNumber + - medicaidId + - medicareId + + sensitive_fields: + description: Fields that should be classified as sensitive + fields: + - income (all income fields) + - assets (all asset fields) + - expenses + - medicalConditions + - disabilityStatus + - immigrationStatus + - citizenshipStatus + - criminalHistory + - drugScreeningResults + - caseNotes + - internalNotes + + openapi_extensions: + x-data-classification: + description: Classification level for this field + type: string + enum: [pii, sensitive, internal, public] + location: Schema property level + example: | + Person: + properties: + socialSecurityNumber: + type: string + x-data-classification: pii + x-encrypt: true + income: + type: number + x-data-classification: sensitive + + x-encrypt: + description: Whether this field must be encrypted at rest + type: boolean + default: false (true for pii/sensitive) + location: Schema property level + + x-mask-in-logs: + description: How to mask this field in logs + type: string + enum: [full, last4, hash, exclude] + default: exclude for pii, full for sensitive + location: Schema property level + + implementation_notes: + - Use schema preprocessing to validate classifications are set + - Generate data handling documentation from classifications + - API gateway/middleware can enforce masking based on extensions + - Audit logging can use classifications to determine detail level