Feature Branch: 001-no-show-predictor
Date: 2026-01-28
Plan: plan.md
This document defines the data entities, relationships, and validation rules for the no-show predictor system. The model supports synthetic data generation that aligns with common EHR schema patterns (e.g., Athena, Epic), enabling realistic ML training and seamless future integration with real data sources.
Source Systems: Practice management and EHR systems (schema based on industry-standard field naming)
┌─────────────┐ ┌─────────────────┐ ┌─────────────┐
│ Patient │──1:N──│ Appointment │──N:1──│ Provider │
└─────────────┘ └─────────────────┘ └─────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────────┐ ┌─────────────┐
│ Insurance │ │ Prediction │ │ Department │
└─────────────┘ └─────────────────┘ └─────────────┘
Represents a patient with demographic and behavioral attributes.
| Field | Type | Source Field | Constraints | Description |
|---|---|---|---|---|
patientid |
integer | patientid |
PK | Unique patient identifier |
enterpriseid |
integer | enterpriseid |
Unique | MRN equivalent in Epic |
patient_gender |
string | patient_gender |
Required: M, F, Other | Patient sex |
patient_age_bucket |
string | Calculated | Required | Age range: 0-17, 18-39, 40-64, 65+ |
patient_race_ethnicity |
string | patient_race_ethnicity |
Optional | Race/ethnicity (derived from patient_race, patient_ethnicity) |
patient_email |
string | patient_email |
Optional, email format | Contact email (synthetic) |
patient_zip_code |
string | patient_zip_code |
Optional, 5-digit | Patient address zip code |
portal_enterpriseid |
numeric | portal_enterpriseid |
Optional | Portal enterprise ID |
portal_last_login |
datetime | portal_last_login |
Optional | Last patient portal login |
Computed Properties:
patient_age_bucket: Derived from date of birthhistorical_no_show_count: Count of past appointments with status indicating no-showhistorical_no_show_rate:historical_no_show_count / total_past_appointmentsportal_engaged:portal_last_loginwithin last 90 days
Validation Rules:
patient_age_bucketmust be one of defined bucketspatient_zip_codemust be 5 digits
Represents a healthcare provider.
| Field | Type | Source Field | Constraints | Description |
|---|---|---|---|---|
providerid |
integer | providerid |
PK | Unique provider identifier |
pro_providerid |
integer | pro_providerid |
Optional | Source system provider ID |
providerfirstname |
string | providerfirstname |
Required | Provider first name |
providerlastname |
string | providerlastname |
Required | Provider last name |
providertype |
string | providertype |
Required | Type: Physician, NP, PA, etc. |
providertypecategory |
string | providertypecategory |
Optional | Provider type category |
provider_specialty |
string | provider_specialty |
Required | Medical specialty |
provider_specialty_service_line |
string | provider_specialty_service_line |
Optional | Service line mapping |
providernpinumber |
string | providernpinumber |
Optional | NPI number |
provider_affiliation |
string | provider_affiliation |
Optional | Employed, Affiliated, etc. |
entitytype |
string | entitytype |
Optional | Provider entity type |
billableyn |
string | billableyn |
Optional | Billable flag (Y/N) |
patientfacingname |
string | patientfacingname |
Optional | Patient-facing display name |
Computed Properties:
display_name:{providerfirstname} {providerlastname}, {provider_specialty}reportingname: Full name for reporting (from source)schedulingname: Name used in scheduling
Represents a clinic/department location.
| Field | Type | Source Field | Constraints | Description |
|---|---|---|---|---|
departmentid |
integer | departmentid |
PK | Unique department identifier |
departmentname |
string | departmentname |
Required | Department name |
departmentspecialty |
string | departmentspecialty |
Optional | Department specialty |
billingname |
string | billingname |
Optional | Billing name |
placeofservicecode |
string | placeofservicecode |
Optional | Place of service code |
placeofservicetype |
string | placeofservicetype |
Optional | Office, Telehealth, etc. |
providergroupid |
integer | providergroupid |
Optional | Provider group ID |
departmentgroup |
string | departmentgroup |
Optional | Department group |
contextid |
integer | contextid |
Optional | Context/org ID |
contextname |
string | contextname |
Optional | Context name (Region A, Region B) |
market |
string | market |
Optional | Market region |
division |
string | division |
Optional | Division/business group |
business_unit |
string | business_unit |
Optional | Business unit |
Represents a scheduled medical appointment. Maps to EHR appointment source tables.
| Field | Type | Source Field | Constraints | Description |
|---|---|---|---|---|
appointmentid |
integer | appointmentid |
PK | Unique appointment identifier |
parentappointmentid |
integer | parentappointmentid |
Optional | Parent for grouped appointments |
patientid |
integer | patientid |
FK → Patient | Patient reference |
providerid |
integer | providerid |
FK → Provider | Provider reference |
departmentid |
integer | departmentid |
FK → Department | Department reference |
referringproviderid |
integer | referringproviderid |
Optional | Referring provider |
referralauthid |
integer | referralauthid |
Optional | Referral authorization ID |
appointmentdate |
date | appointmentdate |
Required | Scheduled date |
appointmentstarttime |
string | appointmentstarttime |
Required | Start time (HH:MM format) |
appointmentdatetime |
datetime | Calculated | Required | Combined date and time |
appointmentduration |
integer | appointmentduration |
Required | Duration in minutes |
appointmenttypeid |
integer | appointmenttypeid |
Required | Appointment type ID |
appointmenttypename |
string | appointmenttypename |
Required | Appointment type name |
appointmentstatus |
string | appointmentstatus |
Required | Current status |
appointmentcreateddatetime |
datetime | appointmentcreateddatetime |
Required | When created |
appointmentcreatedby |
string | appointmentcreatedby |
Optional | Created by user |
appointmentscheduleddatetime |
datetime | appointmentscheduleddatetime |
Required | When scheduled |
scheduledby |
string | scheduledby |
Optional | Scheduled by user |
appointmentcheckindatetime |
datetime | appointmentcheckindatetime |
Optional | Check-in time |
appointmentcheckoutdatetime |
datetime | appointmentcheckoutdatetime |
Optional | Check-out time |
appointmentcancelleddatetime |
datetime | appointmentcancelleddatetime |
Optional | Cancellation time |
cancelledby |
string | cancelledby |
Optional | Cancelled by user |
appointmentcancelreason |
string | appointmentcancelreason |
Optional | Cancellation reason |
rescheduledappointmentid |
integer | rescheduledappointmentid |
Optional | Original appointment if rescheduled |
rescheduleddatetime |
datetime | rescheduleddatetime |
Optional | Reschedule timestamp |
rescheduledby |
string | rescheduledby |
Optional | Rescheduled by user |
startcheckindatetime |
datetime | startcheckindatetime |
Optional | Check-in process start |
stopsignoffdatetime |
datetime | stopsignoffdatetime |
Optional | Sign-off process stop |
appointmentdeleteddatetime |
datetime | appointmentdeleteddatetime |
Optional | Deletion timestamp |
claimid |
integer | claimid |
Optional | Associated claim ID |
cycletime |
numeric | cycletime |
Optional | Scheduling to appointment time |
frozenyn |
string | frozenyn |
Optional | Frozen flag (Y/N) |
appointmentfrozenreason |
string | appointmentfrozenreason |
Optional | Reason for frozen |
virtual_flag |
string | Calculated | Required | Virtual-Telephone, Virtual-Video, Non-Virtual |
new_patient_flag |
string | Calculated | Required | NEW PATIENT or EST PATIENT |
webschedulableyn |
integer | webschedulableyn |
Optional | Online schedulable (1/0) |
Computed Properties:
appointmentdatetime: Combined fromappointmentdate+appointmentstarttimelead_time_days: Days betweenappointmentscheduleddatetimeandappointmentdatetimeday_of_week: Extracted fromappointmentdate(Monday=0...Sunday=6)hour_of_day: Extracted fromappointmentstarttime(0-23)is_past:appointmentdatetime < nowvirtual_flag: Derived fromappointmenttypenamecontaining VIRTUAL, TELEHEALTH, or PHONEnew_patient_flag: Derived fromappointmenttypenamecontaining NEW or CONSULTno_show: Derived fromappointmentstatusand check-in timestamps
Status Values (from appointmentstatus):
Scheduled- Future appointmentChecked In- Patient arrivedChecked Out/Complete- Visit completedCancelled- Appointment cancelledNo Show- Patient did not attendRescheduled- Moved to different time
Validation Rules:
appointmentscheduleddatetimemust be ≤appointmentdatetimeappointmentdatetimemust be during business hours (configurable)
Represents patient insurance information. Derived from appointment and encounter data.
| Field | Type | Source Field | Constraints | Description |
|---|---|---|---|---|
primarypatientinsuranceid |
integer | primarypatientinsuranceid |
PK | Primary insurance ID |
sipg1 |
string | sipg1 |
Optional | Specific payer grouping |
sipg2 |
string | sipg2 |
Optional | General payer grouping |
insurance_plan_1_company_description |
string | insurance_plan_1_company_description |
Optional | Insurance company name |
insurance_group_id |
string | insurance_group_id |
Optional | Insurance group ID |
Represents an ML model prediction for an appointment.
| Field | Type | Constraints | Description |
|---|---|---|---|
prediction_id |
string | PK, UUID format | Unique identifier |
appointmentid |
integer | FK → Appointment, Required | Appointment reference |
no_show_probability |
decimal | Required, 0.0-1.0 | Model confidence |
risk_level |
enum | Computed: Low, Medium, High | Categorized risk |
risk_factors |
array[RiskFactor] | Required | Contributing factors |
model_version |
string | Required | Model version used |
predicted_at |
datetime | Auto-generated | When prediction was made |
Computed Properties:
risk_levelcalculation:Low: probability < 0.3Medium: probability 0.3-0.6High: probability > 0.6
Represents a contributing factor to a prediction (embedded in Prediction).
| Field | Type | Constraints | Description |
|---|---|---|---|
factor_name |
string | Required | Feature name |
factor_value |
string | Required | Actual value |
contribution |
decimal | Required, 0.0-1.0 | SHAP-like importance |
direction |
enum | Required: Increases, Decreases | Effect on risk |
Example:
{
"factor_name": "previous_no_shows",
"factor_value": "3",
"contribution": 0.45,
"direction": "Increases"
}Represents a suggested action based on prediction analysis (agent-generated).
| Field | Type | Constraints | Description |
|---|---|---|---|
recommendation_id |
string | PK, UUID format | Unique identifier |
appointment_id |
string | FK → Appointment, Required | Related appointment |
action_type |
enum | Required: ConfirmationCall, Reminder, Overbook, NoAction | Suggested action |
priority |
enum | Required: Urgent, High, Medium, Low | Action priority |
rationale |
string | Required | Explanation for recommendation |
created_at |
datetime | Auto-generated | When generated |
Features extracted for ML model training (derived from production schema):
| Feature | Type | Source | Production Field(s) | Description |
|---|---|---|---|---|
patient_age_bucket |
cat | Patient | patient_age_bucket (calculated) |
Age range category |
patient_gender |
cat | Patient | patient_gender |
Gender category |
patient_zip_code |
cat | Patient | patient_zip_code |
Geographic indicator |
patient_race_ethnicity |
cat | Patient | patient_race_ethnicity |
Demographics |
portal_engaged |
bool | Patient | portal_last_login |
Portal activity in 90 days |
sipg2 |
cat | Insurance | sipg2 |
General payer grouping |
historical_no_show_rate |
float | Calculated | Appointment history | Past no-show percentage |
historical_no_show_count |
int | Calculated | Appointment history | Past no-show count |
lead_time_days |
int | Appointment | appointmentscheduleddatetime, appointmentdatetime |
Days scheduled ahead |
appointmenttypename |
cat | Appointment | appointmenttypename |
Visit type name |
virtual_flag |
cat | Appointment | Calculated from type | Virtual/Non-Virtual |
new_patient_flag |
cat | Appointment | Calculated from type | New vs established |
day_of_week |
cat | Appointment | appointmentdate |
Day of week |
hour_of_day |
int | Appointment | appointmentstarttime |
Hour of appointment |
appointmentduration |
int | Appointment | appointmentduration |
Duration in minutes |
providerid |
cat | Appointment | providerid |
Provider (encoded) |
provider_specialty |
cat | Provider | provider_specialty |
Provider specialty |
providertype |
cat | Provider | providertype |
Physician, NP, PA |
departmentspecialty |
cat | Department | departmentspecialty |
Department specialty |
placeofservicetype |
cat | Department | placeofservicetype |
Office, Telehealth |
market |
cat | Department | market |
Geographic market |
webschedulableyn |
bool | Appointment | webschedulableyn |
Online scheduled |
cycletime |
float | Appointment | cycletime |
Schedule to appt time |
Target Variable: Derived no_show boolean from appointmentstatus (No Show status or no check-in for past appointments)
┌──────────────┐
┌───────│ Scheduled │───────┐
│ └──────────────┘ │
│ │ │
cancel() │ checkin() │ time passes +
│ │ no checkin
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Cancelled │ │ Checked In │ │ No Show │
└─────────────┘ └──────────────┘ └─────────────┘
│
checkout()|
▼
┌──────────────┐
│ Complete │
└──────────────┘
For synthetic data generation, matching production schema patterns:
-
Higher no-show probability when:
- Patient has history of no-shows (strongest predictor, +0.25)
lead_time_days> 14 days (+0.10)sipg2= Medicaid or Self-Pay (+0.08)- Monday or Friday appointments (+0.05)
new_patient_flag= 'NEW PATIENT' (+0.05)portal_engaged= false (+0.04)hour_of_dayin afternoon (14:00-16:00) (+0.03)
-
Lower no-show probability when:
- Patient has 100% attendance history (-0.15)
lead_time_days< 3 days (-0.08)- Early morning appointments (07:00-09:00) (-0.05)
virtual_flag= 'Virtual-Video' (-0.05)webschedulableyn= 1 (self-scheduled) (-0.03)
-
Base no-show rate: ~22% (matches typical outpatient)
Generate realistic care sequences per patient:
| Journey Type | Pattern | Frequency |
|---|---|---|
| Routine Care | Annual wellness → follow-up PRN | 40% of patients |
| Chronic Management | Initial → monthly/quarterly follow-ups (6-12 visits/year) | 25% of patients |
| Episodic | Single visit or 2-3 visit episode, then gap | 20% of patients |
| Referral Chain | PCP visit → specialist referral → specialist follow-up | 10% of patients |
| Care Abandonment | 2-4 appointments → no-show → no-show → drops out | 5% of patients |
Journey Generation Rules:
- Each patient has 5-50 appointments over the 24-month window (avg ~20)
- Follow-up appointments reference prior visit via
parentappointmentidor scheduling patterns - Care abandonment journeys show escalating no-show probability (50% → 70% → 90%)
- Referral chains link PCP departments to specialists via
referringproviderid
| Period | No-Show Modifier | Notes |
|---|---|---|
| Dec 20-Jan 5 | +15% | Holiday season |
| Jul 1-Aug 15 | +8% | Summer vacation |
| Mon after long weekend | +10% | Extended weekends |
| Week before school starts | +5% | Back-to-school chaos |
| Tax season (Apr 1-15) | +3% | Competing priorities |
| Field | Distribution | Notes |
|---|---|---|
patient_age_bucket |
15% 0-17, 30% 18-39, 35% 40-64, 20% 65+ | Typical outpatient |
patient_gender |
55% F, 44% M, 1% Other | Primary care skew |
sipg2 |
40% Commercial, 25% Medicare, 20% Medicaid, 15% Self-Pay | Payer mix |
virtual_flag |
70% Non-Virtual, 20% Video, 10% Phone | Post-COVID typical |
new_patient_flag |
25% NEW, 75% EST | Established patient majority |
lead_time_days |
Gamma(shape=2, scale=7), clipped 0-90 | Right-skewed |
appointmentduration |
15 (40%), 30 (35%), 45 (15%), 60 (10%) | Standard slots |
| Query | Entities | Expected Frequency |
|---|---|---|
| Appointments by date range | Appointment | High |
| Appointments by patient | Appointment, Patient | Medium |
| Appointments by provider and date | Appointment, Provider | Medium |
| High-risk appointments (prediction > threshold) | Appointment, Prediction | High |
| Patient history | Patient, Appointment | Medium |
| Department schedule | Appointment, Department | Medium |
-- Appointment queries (matches production patterns)
CREATE INDEX idx_appointment_date ON appointments(appointmentdate);
CREATE INDEX idx_appointment_datetime ON appointments(appointmentdatetime);
CREATE INDEX idx_appointment_patient ON appointments(patientid);
CREATE INDEX idx_appointment_provider_date ON appointments(providerid, appointmentdate);
CREATE INDEX idx_appointment_department ON appointments(departmentid, appointmentdate);
CREATE INDEX idx_appointment_status ON appointments(appointmentstatus);
-- Prediction queries
CREATE INDEX idx_prediction_appointment ON predictions(appointmentid);
CREATE INDEX idx_prediction_probability ON predictions(no_show_probability DESC);
-- Patient queries
CREATE INDEX idx_patient_enterprise ON patients(enterpriseid);Common field mappings between practice management and EHR systems:
| PM Field | EHR Field | Notes |
|---|---|---|
appointmentid |
parentappointmentid |
Epic uses parent ID for grouping |
patientid |
patient_id |
Direct mapping |
enterpriseid |
MRN | MRN equivalent in Epic |
providerid |
Shows provider name | Epic shows name, not ID |
referringproviderid |
Shows provider name | Epic shows name, not ID |
appointmenttypeid |
appointmenttypename |
Epic uses name, not ID |
primarypatientinsuranceid |
Insurance class name | Epic shows high-level grouping |
| Field | Usage |
|---|---|
contextname |
Delineates market regions |
market |
Geographic market identifier |
division |
Organizational division |
business_unit |
Custom department grouping |
| Entity | Count | Notes |
|---|---|---|
| Patient | 5,000 | Synthetic patients with realistic distributions |
| Provider | 100 | Various specialties and types |
| Department | 40 | Mix of specialties and locations |
| Appointment | 100,000 | 24 months history + 2 weeks future (captures seasonality) |
| Insurance Records | 5,000 | Linked to patients |
| Prediction | ~2,000 | Generated on-demand for future appointments |
| Recommendation | ~500 | Generated per agent session |
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-01-28 | Initial data model |
| 2.0 | 2026-01-28 | Aligned with common EHR schema patterns |