Skip to content
Andy Gregorowicz edited this page May 6, 2022 · 8 revisions

Synthea Fixed Records

Synthea has a feature that allows for fixed demographic information, sometimes referred to as fixed records. Users are able to supply demographic information to Synthea that will be used within the simulation and exported in the records that are generated. Further, users can submit multiple sets of demographic information for the same person. This is used with Synthea's split record feature to simulate the types of variation in demographic information typically found across information systems. As an example, a person may have their first name listed as "William" in the electronic health record system of their primary care provider, but their first name may be "Bill" in the electronic health record system of an urgent care provider. This differing demographic information can be provided to Synthea and it will export records where one lists the first name as "William", while another lists it as "Bill".

Note - This is an experimental feature that is undergoing heavy revision on the fixed-records-enhancements branch. Expect the format and behaviors of this functionality to change

Current Functionality

How it works

Users supply a file to Synthea using the -f flag that contains record groups. A record group is a collection of records that represent a single individual. When using the fixed records feature, you cannot specify a Synthea population size, as that will be determined by the number of record groups in the provided file. Similarly, the age range and gender command line flags cannot be used as that information will also be provided in the file.

An example file with a single record group is as follows:

[
  {
    "records": [
      {
        "LIST_ID": "DH",
        "RECORD_ID": "19436338",
        "CHILD_SURNAME": "Fredrich",
        "CHILD_GIVEN_NAME": "Alex or Alex Neil",
        "DOB_YEAR": "2014",
        "DOB_MONTH": "7",
        "DOB_DAY": "30",
        "GENDER": "M",
        "PHONE_AREA_CODE": "303",
        "PHONE_NUMBER": "1515151",
        "ADDRESS_STREET1": "12439 Josephine St",
        "ADDRESS_STREET2": "",
        "ADDRESS_CITY": "Thornton",
        "ADDRESS_STATE": "Colorado",
        "ADDRESS_COUNTRY": "",
        "ADDRESS_ZIPCODE": "80241-2744",
        "PARENT1_SURNAME": "Fredric",
        "PARENT1_GIVEN_NAME": "Kimberly Adrienne",
        "PARENT1_EMAIL": "[email protected]"
      },
      {
        "LIST_ID": "CH",
        "RECORD_ID": "19436336",
        "CHILD_SURNAME": "Fredrich",
        "CHILD_GIVEN_NAME": "Alex Neil",
        "DOB_YEAR": "2014",
        "DOB_MONTH": "7",
        "DOB_DAY": "30",
        "GENDER": "M",
        "PHONE_AREA_CODE": "303",
        "PHONE_NUMBER": "8585858",
        "ADDRESS_STREET1": "josephine st.",
        "ADDRESS_STREET2": "",
        "ADDRESS_CITY": "Thornton",
        "ADDRESS_STATE": "Colorado",
        "ADDRESS_COUNTRY": "",
        "ADDRESS_ZIPCODE": "80241-2744",
        "PARENT1_SURNAME": "Fredrich-Smith",
        "PARENT1_GIVEN_NAME": "Kimberly MNU",
        "PARENT1_EMAIL": "[email protected]"
      },
      {
        "LIST_ID": "KP",
        "RECORD_ID": "19436337",
        "CHILD_SURNAME": "Fredrich",
        "CHILD_GIVEN_NAME": "Alex Neil",
        "DOB_YEAR": "2004",
        "DOB_MONTH": "7",
        "DOB_DAY": "30",
        "GENDER": "M",
        "PHONE_AREA_CODE": "303",
        "PHONE_NUMBER": "3737373",
        "ADDRESS_STREET1": "3986 Oneida St",
        "ADDRESS_STREET2": "",
        "ADDRESS_CITY": "Littleton",
        "ADDRESS_STATE": "Colorado",
        "ADDRESS_COUNTRY": "",
        "ADDRESS_ZIPCODE": "80122-4303",
        "PARENT1_SURNAME": "Fredrich",
        "PARENT1_GIVEN_NAME": "Kimberly",
        "PARENT1_EMAIL": "[email protected]"
      }
    ],
    "count": 3
  }
]

When using this feature, the configuration property exporter.split_records will be set to true. In the case of the record group shown in the example, 3 records will be output. When running the Synthea simulation, the first record in the group will be used to drive the simulation. That means that the date of birth, gender, city and state provided in the record is what will be used by Synthea to drive the actual simulation.

When an individual is simulated, Synthea will run a simulation for the person as it normally would. At the end of the simulation, it will check to see whether the individual has visited greater than or equal to the number of records in the record group. If not, Synthea will rerun the simulation with a different random seed.

When it has finished the simulation with the appropriate number of providers, it will export the records. Records will have different demographic information based on what is listed in the records in the record groups. In the example, the first record output will have the first name of "Alex or Alex Neil" while the second record will have a first name of "Alex Neil". In cased where a person visits more providers than there are records in the record group, health records will be combined so that the output matches the number of records.

Caveats

The first record in the record group is used as "ground truth" in the clinical simulation, so it needs to be information that Synthea understands. Cities must be present in the Synthea demographics file. Note that not all cities or towns are present in the demographics file, so be sure to verify that the city being used is one that Synthea has information for.

Users should be careful about creating large record groups for younger people. The person must visit at least as many providers as there are records in the group. If the date of birth would create a 1 year old individual, but there are 10 records in the record group, it is likely to throw Synthea into an infinite loop as there probably isn't a path in Synthea simulations for a 1 year old to visit 10 separate providers.

The current format allows for the creation of invalid dates, such as February 30. While some users may wish to use something like that to show some data systems that may allow for the storage of invalid dates, the Synthea tools are not capable of handling invalid dates. Synthea is capable of handling valid dates that are supposed to represent typos but completely unreasonable for people alive today, such as "February 11, 22" (the actual year 22) or "February 11, 2202". Just don't use dates like this in the first record.

Enhanced Version

How it works

There are three main concepts to understand when using fixed records. They are:

  • Entity - This is a single person. A person will live in different places throughout their life. They may also change their name or other contact information over time.
  • Seed - An Entity will have one or more seeds. A seed is the "ground truth" for the entity. It is the correct location, date of birth, etc. for an entity for a particular time range. A series of seeds for an entity will show how their demographic information changes over time.
  • Variant - A seed may have variants. A variant represents some deviation from the information in the seed. It could be a typo in a name or address. It could show a previous address. It can be wrong in all sorts of ways. The intent is to use this to emulate the types of errors typically seen in demographic information stored in real-world systems.

Users supply a file to Synthea using the -f flag that contains entities. When using the fixed records feature, you cannot specify a Synthea population size, as that will be determined by the number of entities in the provided file. Similarly, the age range and gender command line flags cannot be used as that information will also be provided in the file.

An example file with a single entity is as follows:

{
  "records": [
    {
      "individualId": "1234",
      "dateOfBirth": "1944-03-17",
      "gender": "F",
      "seeds": [
        {
          "seedId": "5678",
          "period": {
            "start": "1944-03-17",
            "end": "1958-03-17"
          },
          "givenName": "Rita Ebony",
          "familyName": "Noble",
          "phone": "9194819376",
          "addressLines": [
            " 865 WHITT TOWN RD"
          ],
          "city": "ROXBORO",
          "state": "NC",
          "zipCode": "27574",
          "variants": [
            {
              "variantId": "91011",
              "dateOfBirth": "1944-03-17",
              "gender": "F",
              "givenName": "Margarita Ebony",
              "familyName": "Noble",
              "phone": "9194819376",
              "addressLines": [
                " 865 WHITT TOWN RD"
              ],
              "city": "ROXBORO",
              "state": "NC",
              "zipCode": "27574"
            },
            {
              "variantId": "1213",
              "dateOfBirth": "1944-04-17",
              "gender": "F",
              "givenName": "Margarita Ebony",
              "familyName": "Noble",
              "phone": "9194819376",
              "addressLines": [
                " 865 WHITT TOWN RD"
              ],
              "city": "ROXBORO",
              "state": "NC",
              "zipCode": "27574"
            }
          ]
        },
        {
          "seedId": "1415",
          "period": {
            "start": "1958-03-18",
            "end": "1970-03-17"
          },
          "givenName": "Rita Ebony",
          "familyName": "Noble",
          "phone": "9194819376",
          "addressLines": [
            "7217 BRIGHTON HILL LN"
          ],
          "city": "RALEIGH",
          "state": "NC",
          "zipCode": "27616",
          "variants": [
            {
              "variantId": "1617",
              "dateOfBirth": "1944-04-17",
              "gender": "F",
              "givenName": "Margarita Ebony",
              "familyName": "Noble",
              "phone": "9194819376",
              "addressLines": [
                "7217 BRIGHTON-HILL"
              ],
              "city": "RAHLEY",
              "state": "NC",
              "zipCode": "27616"
            }
          ]
        },
        {
          "seedId": "1416",
          "period": {
            "start": "1970-03-18"
          },
          "givenName": "Rita Ebony",
          "familyName": "Noble",
          "phone": "9194819376",
          "addressLines": [
            "7217 BRIGHTON HILL LN"
          ],
          "city": "RALEIGH",
          "state": "NC",
          "zipCode": "27616",
          "variants": [
            {
              "variantId": "1618",
              "dateOfBirth": "1944-04-17",
              "gender": "F",
              "givenName": "M E",
              "familyName": "Noble",
              "phone": "9194819376",
              "addressLines": [
                "7217 BRIGHTON-HILL"
              ],
              "city": "RAHLEY",
              "state": "NC",
              "zipCode": "27616"
            }
          ]
        }

      ]
    }
  ]
}

Clone this wiki locally