Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,71 @@
# Yet Another Synthea Fork
Holds a snapshot fork and helper scripts to generate synthetic data using the great Synthea<sup>TM</sup> resource. Particularly handling notes and take aways from generating for different products and use cases in my ventures.

## Generate Bundles for InterSystems OMOP Server
This generates FHIR Resources, that we _convert_ into FHIR Bulk Export format (.zip) in a process. It will generate about a 100-150 for all states, and also generate a .zip for the organizations, practitioners, and the population.

We construct Bulk FHIR Export as there seems to be this issue every time I try to post bundles, bulk fhir imports with InterSystems FHIR Related products. This way, as seen in the script below, we can grep out problematic references so they post wihtout issue... a simple pre loading data quality step before loading to Bronze.

Fixing reference format...

```
find . -type f -exec sed -i s#"?identifier=https:\/\/github.com\/synthetichealth\/synthea|"#/\#g {} +
find . -type f -exec sed -i s#"?identifier=http:\/\/hl7.org\/fhir\/sid\/us-npi|"#/\#g {} +
```

![alt text](image.png)

## Load Order

Load the org-State.zip First.
Load the prac-$State.zip Second.
Then load the pop-$state Last.

## Attestation
When loading the InterSystems OMOP Server (with FHIR Conversion capabilities), the entire generated data set loaded each state at about 10 minutes a shot...

![alt text](image-1.png)

Each state had very good coverage with varying resource counts using the default modules included in this repo.

![alt text](image-2.png)

## Samples
Included in the `upload` folder is example output for the state of Michigan.


Here is the script below to get an idea on how I use it to iterate and generate fhir for testing.

`intersystems_omop_data.sh`

```bash
rm -rf output
rm -rf upload
mkdir upload

declare -a states=("Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut" "Delaware" "Florida" "Georgia" "Hawaii" "Idaho" "Illinois" "Indiana" "Iowa" "Kansas" "Kentucky" "Louisiana" "Maine" "Maryland" "Massachusetts" "Michigan" "Minnesota" "Mississippi" "Missouri" "Montana" "Nebraska" "Nevada" "Ohio" "Oklahoma" "Oregon" "Pennsylvania" "Tennessee" "Texas" "Utah" "Vermont" "Virginia" "Washington" "Wisconsin" "Wyoming")


for state in ${!states[@]}; do
echo ${states[$state]}
count=`shuf -i 100-150 -n 1`
./run_synthea -s 234 -p $count ${states[$state]} --exporter.baseDirectory="./output/output_${states[$state]}/"
cd output/output_${states[$state]}/fhir
find . -type f -exec sed -i s#"?identifier=https:\/\/github.com\/synthetichealth\/synthea|"#/\#g {} +
find . -type f -exec sed -i s#"?identifier=http:\/\/hl7.org\/fhir\/sid\/us-npi|"#/\#g {} +
jq -c . hospital*.json > hospital.ndjson
zip -r hosp-${states[$state]}.zip hospital.ndjson
jq -c . practitioner*.json > practitioner.ndjson
zip -r prac-${states[$state]}.zip practitioner.ndjson
jq -c . *.json > pop-${states[$state]}.ndjson
zip -r pop-${states[$state]}.zip pop-${states[$state]}.ndjson
cp *.zip ../../../upload
cd ../../../
pwd
done
```


# Synthea<sup>TM</sup> Patient Generator ![Build Status](https://github.com/synthetichealth/synthea/workflows/.github/workflows/ci-build-test.yml/badge.svg?branch=master) [![codecov](https://codecov.io/gh/synthetichealth/synthea/branch/master/graph/badge.svg)](https://codecov.io/gh/synthetichealth/synthea)

Synthea<sup>TM</sup> is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.
Expand Down
Binary file added image-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions intersystems_omop_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# adjust to your liking
rm -rf output
rm -rf upload
mkdir upload

declare -a states=("Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut" "Delaware" "Florida" "Georgia" "Hawaii" "Idaho" "Illinois" "Indiana" "Iowa" "Kansas" "Kentucky" "Louisiana" "Maine" "Maryland" "Massachusetts" "Michigan" "Minnesota" "Mississippi" "Missouri" "Montana" "Nebraska" "Nevada" ""New Hampshire"" ""New Jersey"" ""New Mexico"" ""New York"" ""North Carolina"" ""North Dakota"" "Ohio" "Oklahoma" "Oregon" "Pennsylvania" ""Rhode Island"" ""South Carolina"" ""South Dakota"" "Tennessee" "Texas" "Utah" "Vermont" "Virginia" "Washington" ""West Virginia"" "Wisconsin" "Wyoming")
declare -a states=("Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut" "Delaware" "Florida" "Georgia" "Hawaii" "Idaho" "Illinois" "Indiana" "Iowa" "Kansas" "Kentucky" "Louisiana" "Maine" "Maryland" "Massachusetts" "Michigan" "Minnesota" "Mississippi" "Missouri" "Montana" "Nebraska" "Nevada" "Ohio" "Oklahoma" "Oregon" "Pennsylvania" "Tennessee" "Texas" "Utah" "Vermont" "Virginia" "Washington" "Wisconsin" "Wyoming")


for state in ${!states[@]}; do
echo ${states[$state]}
count=`shuf -i 100-150 -n 1`
./run_synthea -s 234 -p $count ${states[$state]} --exporter.baseDirectory="./output/output_${states[$state]}/"
cd output/output_${states[$state]}/fhir
find . -type f -exec sed -i s#"?identifier=https:\/\/github.com\/synthetichealth\/synthea|"#/\#g {} +
find . -type f -exec sed -i s#"?identifier=http:\/\/hl7.org\/fhir\/sid\/us-npi|"#/\#g {} +
jq -c . hospital*.json > hospital.ndjson
zip -r hosp-${states[$state]}.zip hospital.ndjson
jq -c . practitioner*.json > practitioner.ndjson
zip -r prac-${states[$state]}.zip practitioner.ndjson
jq -c . *.json > pop-${states[$state]}.ndjson
zip -r pop-${states[$state]}.zip pop-${states[$state]}.ndjson
cp *.zip ../../../upload
cd ../../../
pwd
done
214 changes: 214 additions & 0 deletions src/main/resources/modules/sickle_cell_disease.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
{
"name": "sickle_cell_disease",
"remarks": [
"Simplified sickle cell disease (SCD) natural history for Synthea.",
"This is a starter module intended for refinement.",
"It oversamples SCD to ensure enough cases in small populations."
],
"states": {
"Initial": {
"type": "Initial",
"direct_transition": "Decide_SCD"
},

"Terminal": {
"type": "Terminal"
},

"Decide_SCD": {
"type": "Simple",
"remarks": [
"In reality SCD prevalence depends heavily on ancestry.",
"Here we oversample: 10% of patients get SCD to make testing easier.",
"Adjust the distributions to match true prevalence later."
],
"distributed_transition": [
{
"distribution": 0.90,
"transition": "No_SCD"
},
{
"distribution": 0.10,
"transition": "SCD_Diagnosis"
}
]
},

"No_SCD": {
"type": "Simple",
"remarks": [
"Patients without SCD simply exit this module."
],
"direct_transition": "Terminal"
},

"SCD_Diagnosis": {
"type": "ConditionOnset",
"remarks": [
"Onset of sickle cell disease, assumed identified early in life.",
"SNOMED: 127040003 |Sickle cell anemia (disorder)| is used here."
],
"target_encounter": "SCD_Newborn_Encounter",
"codes": [
{
"system": "SNOMED-CT",
"code": "127040003",
"display": "Sickle cell anemia (disorder)"
}
],
"direct_transition": "SCD_Newborn_Encounter"
},

"SCD_Newborn_Encounter": {
"type": "Encounter",
"encounter_class": "outpatient",
"remarks": [
"Newborn / early childhood encounter when SCD is recognized.",
"Could be a screening visit after newborn screening."
],
"codes": [
{
"system": "SNOMED-CT",
"code": "185349003",
"display": "Encounter for screening for other specified conditions"
}
],
"direct_transition": "Enter_Chronic_SCD"
},

"Enter_Chronic_SCD": {
"type": "SetAttribute",
"attribute": "scd_has_had_crisis",
"value": false,
"direct_transition": "Chronic_SCD"
},

"Chronic_SCD": {
"type": "Delay",
"remarks": [
"Represents time between potential vaso-occlusive crises.",
"Average one event every ~1-2 years for this simplified model.",
"Adjust the range or add age-based logic to refine."
],
"range": {
"low": 6,
"high": 24,
"unit": "months"
},
"distributed_transition": [
{
"distribution": 0.70,
"transition": "No_Crisis_This_Period"
},
{
"distribution": 0.30,
"transition": "Pain_Crisis_Encounter"
}
]
},

"No_Crisis_This_Period": {
"type": "Simple",
"remarks": [
"Time passes with no crisis; loop back to Chronic_SCD.",
"You could add chronic complications (nephropathy, stroke risk, etc.) here later."
],
"direct_transition": "Chronic_SCD"
},

"Pain_Crisis_Encounter": {
"type": "Encounter",
"encounter_class": "emergency",
"remarks": [
"Vaso-occlusive pain crisis leading to an ED encounter.",
"Could extend with labs, imaging, or inpatient admission."
],
"reason": "SCD_Diagnosis",
"codes": [
{
"system": "SNOMED-CT",
"code": "443620000",
"display": "Sickle cell crisis (disorder)"
}
],
"direct_transition": "Pain_Crisis_Treatment"
},

"Pain_Crisis_Treatment": {
"type": "MedicationOrder",
"remarks": [
"Generic pain management for crisis.",
"Very simplified: a single opioid order.",
"Swap this RxNorm code for one that matches your formulary if desired."
],
"codes": [
{
"system": "RxNorm",
"code": 8640,
"display": "Morphine sulfate 10 MG/ML Injectable Solution"
}
],
"direct_transition": "Mark_First_Crisis"
},

"Mark_First_Crisis": {
"type": "SetAttribute",
"remarks": [
"Flag that patient has had at least one crisis.",
"We then decide whether to initiate hydroxyurea."
],
"attribute": "scd_has_had_crisis",
"value": true,
"distributed_transition": [
{
"distribution": 0.50,
"transition": "Hydroxyurea_Start"
},
{
"distribution": 0.50,
"transition": "Post_Crisis_Recovery"
}
]
},

"Hydroxyurea_Start": {
"type": "MedicationOrder",
"remarks": [
"Start chronic hydroxyurea therapy after first crisis (simplified)."
],
"codes": [
{
"system": "RxNorm",
"code": 36567,
"display": "Hydroxyurea 500 MG Oral Capsule"
}
],
"direct_transition": "Post_Crisis_Recovery"
},

"Post_Crisis_Recovery": {
"type": "EncounterEnd",
"remarks": [
"End of the crisis encounter; patient returns to chronic state.",
"We could add increased mortality risk by sending some proportion to SCD_Death instead."
],
"direct_transition": "Chronic_SCD"
},

"SCD_Death": {
"type": "Death",
"remarks": [
"Optional: SCD-attributable death.",
"Not wired into the flow yet; you can add conditional transitions to this state based on age, number of crises, etc."
],
"codes": [
{
"system": "ICD-10-CM",
"code": "D57.1",
"display": "Sickle-cell disease without crisis"
}
],
"direct_transition": "Terminal"
}
}
}
4 changes: 2 additions & 2 deletions src/main/resources/synthea.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Starting with a properties file because it requires no additional dependencies

exporter.fhir.bulk_data = true
exporter.baseDirectory = ./output/
exporter.use_uuid_filenames = false
exporter.subfolders_by_id_substring = false
Expand All @@ -21,7 +21,7 @@ exporter.fhir.use_us_core_ig = true
exporter.fhir.us_core_version = 5.0.1
exporter.fhir.transaction_bundle = true
# using bulk_data=true will ignore exporter.pretty_print
exporter.fhir.bulk_data = false
exporter.fhir.bulk_data = true
# hostname embedded into the generated parameters file. Defaults to http://localhost:8080/
#exporter.fhir.bulk_data.parameter_hostname = http://example.org/
# included_ and excluded_resources list out the resource types to include/exclude in the csv exporters.
Expand Down
Binary file added upload/hosp-Michigan.zip
Binary file not shown.
Binary file added upload/pop-Michigan.zip
Binary file not shown.
Binary file added upload/prac-Michigan.zip
Binary file not shown.