Skip to content
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
59a0455
added generate json schema command
andrewelamb Dec 16, 2025
671bf21
added tests
andrewelamb Dec 17, 2025
2b8ee3f
add documentation
andrewelamb Dec 17, 2025
bfd1711
remove uneeded classes
andrewelamb Dec 17, 2025
0b97f3b
add value error
andrewelamb Dec 30, 2025
2d5a882
handle merge conflict
andrewelamb Dec 30, 2025
78b7d14
add CLI tests using urls
andrewelamb Dec 30, 2025
369ffcf
move URL tests to integration file
andrewelamb Dec 30, 2025
81e5156
added info.logging statement for file paths
andrewelamb Dec 30, 2025
9f17ecc
fix command documentation
andrewelamb Dec 30, 2025
a25ec2e
add comments to tutorial script
andrewelamb Dec 30, 2025
a6de851
fix issue with no outoput and not datatypes
andrewelamb Dec 30, 2025
7b6b7d1
Update docs/tutorials/command_line_client.md
andrewelamb Dec 30, 2025
15ba0d3
update data model documentation
andrewelamb Dec 30, 2025
44e58e2
add new minimal data model and tests
andrewelamb Dec 30, 2025
8caa98f
Update docs/explanations/curator_data_model.md
andrewelamb Dec 30, 2025
04336df
ran pre-commit
andrewelamb Dec 30, 2025
e0294c6
add links to columns
andrewelamb Dec 31, 2025
076e521
add note to columnType
andrewelamb Dec 31, 2025
2158fdb
rearange notes
andrewelamb Dec 31, 2025
0d7725b
Merge pull request #1298 from Sage-Bionetworks/update_relationships
andrewelamb Dec 31, 2025
0988bbe
improve error message
andrewelamb Dec 31, 2025
37c9bec
add use case when JSON Schema path is provided but the dir doesnt exist
andrewelamb Dec 31, 2025
4311284
clean up docstring
andrewelamb Dec 31, 2025
3b49b10
clean up tutorial verbage
andrewelamb Dec 31, 2025
f1667e0
fix dirname logic
andrewelamb Dec 31, 2025
3857b74
fix docstring example
andrewelamb Dec 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 58 additions & 38 deletions docs/explanations/curator_data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ The CSV data model described in this tutorial formalizes this structure:

Here is the Patient described above represented as a CSV data model:

| Attribute | DependsOn |
|---|---|
| Attribute | DependsOn |
|-----------|---------------------|
| Patient | "Age, Gender, Name" |
| Age | |
| Gender | |
Expand Down Expand Up @@ -48,9 +48,20 @@ The end goal is to create a JSON Schema that can be used in Curator. A JSON Sche

Note: Individual columns are covered later on this page.

These columns must be present in your CSV data model:

- [Attribute](#attribute)
- [DependsOn](#dependson)
- [Description](#description)
- [Valid Values](#valid-values)
- [Required](#required)
- [Parent](#parent)
- [Validation Rules](#validation-rules)

Defining data types:

- Put a unique data type name in the `Attribute` column.
- Put the value `DataType` in the `Parent` column.
- List at least one attribute in the `DependsOn` column (comma-separated).
- Optionally add a description to the `Description` column.

Expand Down Expand Up @@ -79,8 +90,8 @@ Set of possible values for the current attribute. This attribute will be an enum
Data Model:

| Attribute | DependsOn | Valid Values |
|---|---|---|
| Patient | "Gender" | |
|-----------|-----------|-----------------------|
| Patient | "Gender" | |
| Gender | | "Female, Male, Other" |

JSON Schema output:
Expand All @@ -107,8 +118,8 @@ Note: Leaving this empty is the equivalent of `False`.
Data Model:

| Attribute | DependsOn | Required |
|---|---|---|
| Patient | "Gender, Age" | |
|-----------|----------------|----------|
| Patient | "Gender, Age" | |
| Gender | | True |
| Age | | False |

Expand All @@ -131,8 +142,14 @@ JSON Schema output:
}
```

### Parent

Put the value `DataType` in this column if this row is a data type. Other values are currently ignored. It is currently used to find all the data types in the data model.

### columnType

**NOTE: While this is not required, it it strongly recommended that this column is present and that all attributes are typed explicitly.**

The data type of this attribute. See [type](https://json-schema.org/understanding-json-schema/reference/type).

Must be one of:
Expand All @@ -147,11 +164,11 @@ Must be one of:

Data Model:

| Attribute | DependsOn | columnType |
|---|---|---|
| Patient | "Gender, Hobbies" | |
| Gender | | string |
| Hobbies | | string_list |
| Attribute | DependsOn | columnType | Parent |
|-----------|-------------------|-------------|----------|
| Patient | "Gender, Hobbies" | | DataType |
| Gender | | string | |
| Hobbies | | string_list | |

JSON Schema output:

Expand Down Expand Up @@ -196,11 +213,11 @@ The format of this attribute. See [format](https://json-schema.org/understanding

Data Model:

| Attribute | DependsOn | columnType | Format |
|---|---|---|---|
| Patient | "Gender, Birth Date" | | |
| Gender | | string | |
| Birth Date | | string | date |
| Attribute | DependsOn | columnType | Format | Parent |
|-----------------|----------------------|-------------|--------|----------|
| Patient | "Gender, Birth Date" | | | DataType |
| Gender | | string | | |
| Birth Date | | string | date | |

JSON Schema output:

Expand Down Expand Up @@ -229,11 +246,11 @@ The regex pattern this attribute must match. The type of this attribute must be

Data Model:

| Attribute | DependsOn | columnType | Pattern |
|---|---|---|---|
| Patient | "Gender, ID" | | |
| Gender | | string | |
| ID | | string | [a-f] |
| Attribute | DependsOn | columnType | Pattern | Parent |
|-----------|---------------|-------------|---------|----------|
| Patient | "Gender, ID" | | | DataType |
| Gender | | string | | |
| ID | | string | [a-f] | |

JSON Schema output:

Expand Down Expand Up @@ -262,12 +279,12 @@ The range that this attribute's numeric values must fall within. The type of thi

Data Model:

| Attribute | DependsOn | columnType | Minimum | Maximum |
|---|---|---|---|---|
| Patient | "Age, Weight, Health Score" | | | |
| Age | | integer | 0 | 120 |
| Weight | | number | 0.0 | |
| Health Score | | number | 0.0 | 1.0 |
| Attribute | DependsOn | columnType | Minimum | Maximum | Parent |
|--------------|-----------------------------|-------------|---------|---------|----------|
| Patient | "Age, Weight, Health Score" | | | | DataType |
| Age | | integer | 0 | 120 | |
| Weight | | number | 0.0 | | |
| Health Score | | number | 0.0 | 1.0 | |

JSON Schema output:

Expand Down Expand Up @@ -299,11 +316,13 @@ JSON Schema output:
}
```

### Validation Rules (deprecated)
### Validation Rules

This column is currently deprecated.

This is a remnant from Schematic. It is still used (for now) to translate certain validation rules to other JSON Schema keywords.
This is a remnant from Schematic. It is still required and in use (for now) to translate certain validation rules to other JSON Schema keywords.

If you are starting a new data model, DO NOT use this column.
If you are starting a new data model, DO NOT fill out this column, just leave it blank.

If you have an existing data model using any of the following validation rules, follow these instructions to update it:

Expand All @@ -315,22 +334,23 @@ If you have an existing data model using any of the following validation rules,

## Conditional dependencies

The `DependsOn` and `Valid Values` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
The `DependsOn`, `Valid Values` and `Parent` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.

In this example we have the `Patient` data type. The `Patient` can be diagnosed as healthy or with cancer. For Patients with cancer we also want to collect info about their cancer type, and any cancers in their family history.

Data Model:

| Attribute | DependsOn | Valid Values | Required | columnType |
|---|---|---|---|---|
| Patient | "Diagnosis" | | | |
| Diagnosis | | "Healthy, Cancer" | True | string |
| Cancer | "Cancer Type, Family History" | | | |
| Cancer Type | | "Brain, Lung, Skin" | True | string |
| Family History | | "Brain, Lung, Skin" | True | string_list |
| Attribute | DependsOn | Valid Values | Required | columnType | Parent |
|----------------|-------------------------------|---------------------|----------|-------------|----------|
| Patient | "Diagnosis" | | | | DataType |
| Diagnosis | | "Healthy, Cancer" | True | string | |
| Cancer | "Cancer Type, Family History" | | | | |
| Cancer Type | | "Brain, Lung, Skin" | True | string | |
| Family History | | "Brain, Lung, Skin" | True | string_list | |

To demonstrate this, see the above example with the `Patient` and `Cancer` data types:

- `Patient` is a data type, but `Cancer` is not, as defined by the `Parent` column.
- `Diagnosis` is an attribute of `Patient`.
- `Diagnosis` has `Valid Values` of `Healthy` and `Cancer`.
- `Cancer` is also a data type.
Expand Down
18 changes: 16 additions & 2 deletions docs/tutorials/command_line_client.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@ synapse [-h] [--version] [-u SYNAPSEUSER] [-p SYNAPSE_AUTH_TOKEN] [-c CONFIGPATH
- [test-encoding](#test-encoding): test character encoding to help diagnose problems
- [get-sts-token](#get-sts-token): Get an STS token for access to AWS S3 storage underlying Synapse
- [migrate](#migrate): Migrate Synapse entities to a different storage location


- [generate-json-schema](#generate-json-schema): Generate JSON Schema(s) from a data model

### `get`

Expand Down Expand Up @@ -544,3 +543,18 @@ synapse migrate [-h] [--source_storage_location_ids [SOURCE_STORAGE_LOCATION_IDS
| `--csv_log_path` | Named | Path where to log a csv documenting the changes from the migration. | |
| `--dryRun` | Named | Dry run, files will be indexed by not migrated. | False |
| `--force` | Named | Bypass interactive prompt confirming migration. | False |

### `generate-json-schema`

Generate JSON Schema(s) from a data model

```bash
synapse generate-json-schema [-h] [--data-types data_type1, data_type2] [--output dir_name] [--data-model-labels class_label] data_model_path
```

| Name | Type | Description |
|--------------------------|------------|---------------------------------------------------------------------|
| `data_model_path` | Positional | Data model path or URL |
| `--data-types` | Named | Optional list of data types to create JSON Schema for |
| `--output` | Named | Optional. Either a file path ending in '.json', or a directory path |
| `--data-model-labels` | Named | Either 'class_label', or 'display_label' |
63 changes: 54 additions & 9 deletions docs/tutorials/python/schema_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ Synapse supports a subset of features from [json-schema-draft-07](https://json-s
In this tutorial, you will learn how to create these JSON Schema using an existing data model.

## Tutorial Purpose
You will create a JSON schema using your data model.

You will create a JSON schema using your data model suing the python client as a library. To use a CLI tool see the [documentation](../command_line_client.md).

## Prerequisites

* You have a working [installation](../installation.md) of the Synapse Python Client.
* You have a data model, see this [example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
* You have a data model, see this [data model_documentation](../../explanations/curator_data_model.md).

## 1. Imports

Expand All @@ -20,29 +22,72 @@ You will create a JSON schema using your data model.
## 2. Set up your variables

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-10}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-11}
```

To create a JSON Schema you need a data model, and the data types you want to create.
The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL.
[Example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
[Data model_documentation](../../explanations/curator_data_model.md).

The data types must exist in your data model. This can be a list of data types, or `None` to create all data types in the data model.

## 3. Log into Synapse

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=12-13}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=13-14}
```

## 4. Create a JSON Schema

Create a JSON Schema

## 4. Create the JSON Schema
Create the JSON Schema(s)
```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=15-23}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=16-23}
```
You should see the first JSON Schema for the datatype(s) you selected printed.

You should see the first JSON Schema for the datatype you selected printed.
It will look like [this schema](https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/dpetest-test.schematic.Patient).
By setting the `output` parameter as path to a "temp" directory, the file will be created as "temp/Patient.json".

## 5. Create multiple JSON Schema

Create multiple JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=26-32}
```

The `data_types` parameter is a list and can have multiple data types.

## 6. Create every JSON Schema

Create every JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=34-39}
```

If you don't set a `data_types` parameter a JSON Schema will be created for every data type in the data model.

## 7. Create a JSON Schema with a certain path

Create a JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=41-47}
```

If you have only one data type and set the `output` parameter to a file path(ending in.json), the JSON Schema file will have that path.

## 8. Create a JSON Schema in the current working directory

Create a JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=49-54}
```

If you don't set `output` parameter the JSON Schema file will be created in the current working directory.

## Source Code for this Tutorial

Expand Down
38 changes: 34 additions & 4 deletions docs/tutorials/python/tutorial_scripts/schema_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,47 @@
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
OUTPUT_DIRECTORY = "./"
OUTPUT_DIRECTORY = "temp"

syn = Synapse()
syn.login()

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output_directory=OUTPUT_DIRECTORY,
data_type=DATA_TYPE,
data_model_labels="class_label",
output=OUTPUT_DIRECTORY,
data_types=DATA_TYPE,
synapse_client=syn,
)

print(schemas[0])


# Create JSON Schemas for multiple data types
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output=OUTPUT_DIRECTORY,
data_types=["Patient", "Biospecimen"],
synapse_client=syn,
)

# Create JSON Schemas for all data types
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output=OUTPUT_DIRECTORY,
synapse_client=syn,
)

# Specify path for JSON Schema
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
data_types=DATA_TYPE,
output="test.json",
synapse_client=syn,
)

# Create JSON Schema in cwd
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
data_types=DATA_TYPE,
synapse_client=syn,
)
Loading
Loading