Sage-Bionetworks · andrewelamb · Dec 16, 2025 · Dec 17, 2025 · Dec 17, 2025 · Dec 17, 2025
@@ -15,8 +15,8 @@ The CSV data model described in this tutorial formalizes this structure:
 
 Here is the Patient described above represented as a CSV data model:
 
-| Attribute | DependsOn |
-|---|---|
+| Attribute | DependsOn           |
+|-----------|---------------------|
 | Patient   | "Age, Gender, Name" |
 | Age       |                     |
 | Gender    |                     |
@@ -48,9 +48,20 @@ The end goal is to create a JSON Schema that can be used in Curator. A JSON Sche
 
 Note: Individual columns are covered later on this page.
 
+These columns must be present in your CSV data model:
+
+- [Attribute](#attribute)
+- [DependsOn](#dependson)
+- [Description](#description)
+- [Valid Values](#valid-values)
+- [Required](#required)
+- [Parent](#parent)
+- [Validation Rules](#validation-rules)
+
 Defining data types:
 
 - Put a unique data type name in the `Attribute` column.
+- Put the value `DataType` in the `Parent` column.
 - List at least one attribute in the `DependsOn` column (comma-separated).
 - Optionally add a description to the `Description` column.
 
@@ -79,8 +90,8 @@ Set of possible values for the current attribute. This attribute will be an enum
 Data Model:
 
 | Attribute | DependsOn | Valid Values          |
-|---|---|---|
-| Patient    | "Gender"  |                       |
+|-----------|-----------|-----------------------|
+| Patient   | "Gender"  |                       |
 | Gender    |           | "Female, Male, Other" |
 
 JSON Schema output:
@@ -107,8 +118,8 @@ Note: Leaving this empty is the equivalent of `False`.
 Data Model:
 
 | Attribute | DependsOn      | Required |
-|---|---|---|
-| Patient    | "Gender, Age"  |          |
+|-----------|----------------|----------|
+| Patient   | "Gender, Age"  |          |
 | Gender    |                | True     |
 | Age       |                | False    |
 
@@ -131,8 +142,14 @@ JSON Schema output:
 }
 ```
 
+### Parent
+
+Put the value `DataType` in this column if this row is a data type. Other values are currently ignored. It is currently used to find all the data types in the data model.
+
 ### columnType
 
+**NOTE: While this is not required, it it strongly recommended that this column is present and that all attributes are typed explicitly.**
+
 The data type of this attribute. See [type](https://json-schema.org/understanding-json-schema/reference/type).
 
 Must be one of:
@@ -147,11 +164,11 @@ Must be one of:
 
 Data Model:
 
-| Attribute | DependsOn         | columnType  |
-|---|---|---|
-| Patient   | "Gender, Hobbies" |             |
-| Gender    |                   | string      |
-| Hobbies   |                   | string_list |
+| Attribute | DependsOn         | columnType  | Parent   |
+|-----------|-------------------|-------------|----------|
+| Patient   | "Gender, Hobbies" |             | DataType |
+| Gender    |                   | string      |          |
+| Hobbies   |                   | string_list |          |
 
 JSON Schema output:
 
@@ -196,11 +213,11 @@ The format of this attribute. See [format](https://json-schema.org/understanding
 
 Data Model:
 
-| Attribute       | DependsOn            | columnType  | Format |
-|---|---|---|---|
-| Patient         | "Gender, Birth Date" |             |        |
-| Gender          |                      | string      |        |
-| Birth Date      |                      | string      | date   |
+| Attribute       | DependsOn            | columnType  | Format | Parent   |
+|-----------------|----------------------|-------------|--------|----------|
+| Patient         | "Gender, Birth Date" |             |        | DataType |
+| Gender          |                      | string      |        |          |
+| Birth Date      |                      | string      | date   |          |
 
 JSON Schema output:
 
@@ -229,11 +246,11 @@ The regex pattern this attribute must match. The type of this attribute must be
 
 Data Model:
 
-| Attribute | DependsOn     | columnType  | Pattern |
-|---|---|---|---|
-| Patient   | "Gender, ID"  |             |         |
-| Gender    |               | string      |         |
-| ID        |               | string      | [a-f]   |
+| Attribute | DependsOn     | columnType  | Pattern | Parent   |
+|-----------|---------------|-------------|---------|----------|
+| Patient   | "Gender, ID"  |             |         | DataType |
+| Gender    |               | string      |         |          |
+| ID        |               | string      | [a-f]   |          |
 
 JSON Schema output:
 
@@ -262,12 +279,12 @@ The range that this attribute's numeric values must fall within. The type of thi
 
 Data Model:
 
-| Attribute    | DependsOn                  | columnType  | Minimum | Maximum |
-|---|---|---|---|---|
-| Patient      | "Age, Weight, Health Score"  |             |         |         |
-| Age          |                            | integer     | 0       | 120     |
-| Weight       |                            | number      | 0.0     |         |
-| Health Score |                            | number      | 0.0     | 1.0     |
+| Attribute    | DependsOn                   | columnType  | Minimum | Maximum | Parent   |
+|--------------|-----------------------------|-------------|---------|---------|----------|
+| Patient      | "Age, Weight, Health Score" |             |         |         | DataType |
+| Age          |                             | integer     | 0       | 120     |          |
+| Weight       |                             | number      | 0.0     |         |          |
+| Health Score |                             | number      | 0.0     | 1.0     |          |
 
 JSON Schema output:
 
@@ -299,11 +316,13 @@ JSON Schema output:
 }
 ```
 
-### Validation Rules (deprecated)
+### Validation Rules
+
+This column is currently deprecated.
 
-This is a remnant from Schematic. It is still used (for now) to translate certain validation rules to other JSON Schema keywords.
+This is a remnant from Schematic. It is still required and in use (for now) to translate certain validation rules to other JSON Schema keywords.
 
-If you are starting a new data model, DO NOT use this column.
+If you are starting a new data model, DO NOT fill out this column, just leave it blank.
 
 If you have an existing data model using any of the following validation rules, follow these instructions to update it:
 
@@ -315,22 +334,23 @@ If you have an existing data model using any of the following validation rules,
 
 ## Conditional dependencies
 
-The `DependsOn` and `Valid Values` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
+The `DependsOn`, `Valid Values` and `Parent` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
 
 In this example we have the `Patient` data type. The `Patient` can be diagnosed as healthy or with cancer. For Patients with cancer we also want to collect info about their cancer type, and any cancers in their family history.
 
 Data Model:
 
-| Attribute      | DependsOn                      | Valid Values        | Required | columnType  |
-|---|---|---|---|---|
-| Patient        | "Diagnosis"                    |                     |          |             |
-| Diagnosis      |                                | "Healthy, Cancer"   | True     | string      |
-| Cancer         | "Cancer Type, Family History"  |                     |          |             |
-| Cancer Type    |                                | "Brain, Lung, Skin" | True     | string      |
-| Family History |                                | "Brain, Lung, Skin" | True     | string_list |
+| Attribute      | DependsOn                     | Valid Values        | Required | columnType  | Parent   |
+|----------------|-------------------------------|---------------------|----------|-------------|----------|
+| Patient        | "Diagnosis"                   |                     |          |             | DataType |
+| Diagnosis      |                               | "Healthy, Cancer"   | True     | string      |          |
+| Cancer         | "Cancer Type, Family History" |                     |          |             |          |
+| Cancer Type    |                               | "Brain, Lung, Skin" | True     | string      |          |
+| Family History |                               | "Brain, Lung, Skin" | True     | string_list |          |
 
 To demonstrate this, see the above example with the `Patient` and `Cancer` data types:
 
+- `Patient` is a data type, but `Cancer` is not, as defined by the `Parent` column.
 - `Diagnosis` is an attribute of `Patient`.
 - `Diagnosis` has `Valid Values` of `Healthy` and `Cancer`.
 - `Cancer` is also a data type.

@@ -81,8 +81,7 @@ synapse [-h] [--version] [-u SYNAPSEUSER] [-p SYNAPSE_AUTH_TOKEN] [-c CONFIGPATH
 - [test-encoding](#test-encoding): test character encoding to help diagnose problems
 - [get-sts-token](#get-sts-token): Get an STS token for access to AWS S3 storage underlying Synapse
 - [migrate](#migrate): Migrate Synapse entities to a different storage location
-
-
+- [generate-json-schema](#generate-json-schema): Generate JSON Schema(s) from a data model
 
 ### `get`
 
@@ -544,3 +543,18 @@ synapse migrate [-h] [--source_storage_location_ids [SOURCE_STORAGE_LOCATION_IDS
 | `--csv_log_path`                | Named      | Path where to log a csv documenting the changes from the migration.                                                                                                                                                                            |         |
 | `--dryRun`                      | Named      | Dry run, files will be indexed by not migrated.                                                                                                                                                                                                | False   |
 | `--force`                       | Named      | Bypass interactive prompt confirming migration.                                                                                                                                                                                                | False   |
+
+### `generate-json-schema`
+
+Generate JSON Schema(s) from a data model
+
+```bash
+synapse generate-json-schema [-h] [--data-types data_type1, data_type2] [--output dir_name] [--data-model-labels class_label] data_model_path
+```
+
+| Name                     | Type       | Description                                                         |
+|--------------------------|------------|---------------------------------------------------------------------|
+| `data_model_path`        | Positional | Data model path or URL                                              |
+| `--data-types`           | Named      | Optional list of data types to create JSON Schema for               |
+| `--output`               | Named      | Optional. Either a file path ending in '.json', or a directory path |
+| `--data-model-labels`    | Named      | Either 'class_label', or 'display_label'                            |
@@ -5,11 +5,13 @@ Synapse supports a subset of features from [json-schema-draft-07](https://json-s
 In this tutorial, you will learn how to create these JSON Schema using an existing data model.
 
 ## Tutorial Purpose
-You will create a JSON schema using your data model.
+
+You will create a JSON schema using your data model suing the python client as a library. To use a CLI tool see the [documentation](../command_line_client.md).
 
 ## Prerequisites
+
 * You have a working [installation](../installation.md) of the Synapse Python Client.
-* You have a data model, see this [example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
+* You have a data model, see this [data model_documentation](../../explanations/curator_data_model.md).
 
 ## 1. Imports
 
@@ -20,29 +22,72 @@ You will create a JSON schema using your data model.
 ## 2. Set up your variables
 
 ```python
-{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-10}
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-11}
 ```
 
 To create a JSON Schema you need a data model, and the data types you want to create.
 The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL.
-[Example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
+[Data model_documentation](../../explanations/curator_data_model.md).
 
 The data types must exist in your data model. This can be a list of data types, or `None` to create all data types in the data model.
 
 ## 3. Log into Synapse
+
 ```python
-{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=12-13}
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=13-14}
 ```
 
+## 4. Create a JSON Schema
+
+Create a JSON Schema
 
-## 4. Create the JSON Schema
-Create the JSON Schema(s)
 ```python
-{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=15-23}
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=16-23}
 ```
-You should see the first JSON Schema for the datatype(s) you selected printed.
+
+You should see the first JSON Schema for the datatype you selected printed.
 It will look like [this schema](https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/dpetest-test.schematic.Patient).
+By setting the `output` parameter as path to a "temp" directory, the file will be created as "temp/Patient.json".
+
+## 5. Create multiple JSON Schema
+
+Create multiple JSON Schema
+
+```python
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=26-32}
+```
+
+The `data_types` parameter is a list and can have multiple data types.
+
+## 6. Create every JSON Schema
+
+Create every JSON Schema
+
+```python
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=34-39}
+```
+
+If you don't set a `data_types` parameter a JSON Schema will be created for every data type in the data model.
+
+## 7. Create a JSON Schema with a certain path
+
+Create a JSON Schema
+
+```python
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=41-47}
+```
+
+If you have only one data type and set the `output` parameter to a file path(ending in.json), the JSON Schema file will have that path.
+
+## 8. Create a JSON Schema in the current working directory
+
+Create a JSON Schema
+
+```python
+{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=49-54}
+```
 
+If you don't set `output` parameter the JSON Schema file will be created in the current working directory.
 
 ## Source Code for this Tutorial
 

@@ -8,17 +8,47 @@
 # Example: ["Patient", "Biospecimen"] or None
 DATA_TYPE = ["Patient"]
 # Directory where JSON Schema files will be saved
-OUTPUT_DIRECTORY = "./"
+OUTPUT_DIRECTORY = "temp"
 
 syn = Synapse()
 syn.login()
 
 schemas, file_paths = generate_jsonschema(
     data_model_source=DATA_MODEL_SOURCE,
-    output_directory=OUTPUT_DIRECTORY,
-    data_type=DATA_TYPE,
-    data_model_labels="class_label",
+    output=OUTPUT_DIRECTORY,
+    data_types=DATA_TYPE,
     synapse_client=syn,
 )
 
 print(schemas[0])
+
+
+# Create JSON Schemas for multiple data types
+schemas, file_paths = generate_jsonschema(
+    data_model_source=DATA_MODEL_SOURCE,
+    output=OUTPUT_DIRECTORY,
+    data_types=["Patient", "Biospecimen"],
+    synapse_client=syn,
+)
+
+# Create JSON Schemas for all data types
+schemas, file_paths = generate_jsonschema(
+    data_model_source=DATA_MODEL_SOURCE,
+    output=OUTPUT_DIRECTORY,
+    synapse_client=syn,
+)
+
+# Specify path for JSON Schema
+schemas, file_paths = generate_jsonschema(
+    data_model_source=DATA_MODEL_SOURCE,
+    data_types=DATA_TYPE,
+    output="test.json",
+    synapse_client=syn,
+)
+
+# Create JSON Schema in cwd
+schemas, file_paths = generate_jsonschema(
+    data_model_source=DATA_MODEL_SOURCE,
+    data_types=DATA_TYPE,
+    synapse_client=syn,
+)