Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
59a0455
added generate json schema command
andrewelamb Dec 16, 2025
671bf21
added tests
andrewelamb Dec 17, 2025
2b8ee3f
add documentation
andrewelamb Dec 17, 2025
bfd1711
remove uneeded classes
andrewelamb Dec 17, 2025
0b97f3b
add value error
andrewelamb Dec 30, 2025
2d5a882
handle merge conflict
andrewelamb Dec 30, 2025
78b7d14
add CLI tests using urls
andrewelamb Dec 30, 2025
369ffcf
move URL tests to integration file
andrewelamb Dec 30, 2025
81e5156
added info.logging statement for file paths
andrewelamb Dec 30, 2025
9f17ecc
fix command documentation
andrewelamb Dec 30, 2025
a25ec2e
add comments to tutorial script
andrewelamb Dec 30, 2025
a6de851
fix issue with no outoput and not datatypes
andrewelamb Dec 30, 2025
7b6b7d1
Update docs/tutorials/command_line_client.md
andrewelamb Dec 30, 2025
15ba0d3
update data model documentation
andrewelamb Dec 30, 2025
44e58e2
add new minimal data model and tests
andrewelamb Dec 30, 2025
8caa98f
Update docs/explanations/curator_data_model.md
andrewelamb Dec 30, 2025
04336df
ran pre-commit
andrewelamb Dec 30, 2025
e0294c6
add links to columns
andrewelamb Dec 31, 2025
076e521
add note to columnType
andrewelamb Dec 31, 2025
2158fdb
rearange notes
andrewelamb Dec 31, 2025
0d7725b
Merge pull request #1298 from Sage-Bionetworks/update_relationships
andrewelamb Dec 31, 2025
0988bbe
improve error message
andrewelamb Dec 31, 2025
37c9bec
add use case when JSON Schema path is provided but the dir doesnt exist
andrewelamb Dec 31, 2025
4311284
clean up docstring
andrewelamb Dec 31, 2025
3b49b10
clean up tutorial verbage
andrewelamb Dec 31, 2025
f1667e0
fix dirname logic
andrewelamb Dec 31, 2025
3857b74
fix docstring example
andrewelamb Dec 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions docs/tutorials/command_line_client.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@ synapse [-h] [--version] [-u SYNAPSEUSER] [-p SYNAPSE_AUTH_TOKEN] [-c CONFIGPATH
- [test-encoding](#test-encoding): test character encoding to help diagnose problems
- [get-sts-token](#get-sts-token): Get an STS token for access to AWS S3 storage underlying Synapse
- [migrate](#migrate): Migrate Synapse entities to a different storage location


- [generate-json-schema](#generate-json-schema): Generate JSON Schema(s) from a data model

### `get`

Expand Down Expand Up @@ -544,3 +543,18 @@ synapse migrate [-h] [--source_storage_location_ids [SOURCE_STORAGE_LOCATION_IDS
| `--csv_log_path` | Named | Path where to log a csv documenting the changes from the migration. | |
| `--dryRun` | Named | Dry run, files will be indexed by not migrated. | False |
| `--force` | Named | Bypass interactive prompt confirming migration. | False |

### `generate-json-schema`

Generate JSON Schema(s) from a data model

```bash
synapse generate-json-schema [-h] --name NAME [--parentid syn123] [--csv foo.csv] data_model_path
```

| Name | Type | Description |
|--------------------------|------------|---------------------------------------------------------------------|
| `data_model_path` | Positional | Data model path or URL |
| `--data-types` | Named | Optional list of data types to create JSON Schema for |
| `--output` | Named | Optional. Either a file path ending in '.json', or a directory path |
| `--data-model-labels` | Named | Either 'class_label', or 'display_label' |
63 changes: 54 additions & 9 deletions docs/tutorials/python/schema_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ Synapse supports a subset of features from [json-schema-draft-07](https://json-s
In this tutorial, you will learn how to create these JSON Schema using an existing data model.

## Tutorial Purpose
You will create a JSON schema using your data model.

You will create a JSON schema using your data model suing the python client as a library. To use a CLI tool see the [documentation](../command_line_client.md).

## Prerequisites

* You have a working [installation](../installation.md) of the Synapse Python Client.
* You have a data model, see this [example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
* You have a data model, see this [data model_documentation](../../explanations/curator_data_model.md).

## 1. Imports

Expand All @@ -20,29 +22,72 @@ You will create a JSON schema using your data model.
## 2. Set up your variables

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-10}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=4-11}
```

To create a JSON Schema you need a data model, and the data types you want to create.
The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL.
[Example data model](https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.column_type_component.csv).
[Data model_documentation](../../explanations/curator_data_model.md).

The data types must exist in your data model. This can be a list of data types, or `None` to create all data types in the data model.

## 3. Log into Synapse

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=12-13}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=13-14}
```

## 4. Create a JSON Schema

Create a JSON Schema

## 4. Create the JSON Schema
Create the JSON Schema(s)
```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=15-23}
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=16-23}
```
You should see the first JSON Schema for the datatype(s) you selected printed.

You should see the first JSON Schema for the datatype you selected printed.
It will look like [this schema](https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/dpetest-test.schematic.Patient).
By setting the `output` parameter as path to a "temp" directory, the file will be created as "temp/Patient.json".

## 5. Create multiple JSON Schema

Create multiple JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=25-30}
```

The `data_types` parameter is a list and can have multiple data types.

## 6. Create every JSON Schema

Create every JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=32-36}
```

If you don't set a `data_types` parameter a JSON Schema will be created for every data type in the data model.

## 7. Create a JSON Schema with a certain path

Create a JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=38-43}
```

If you have only one data type and set the `output` parameter to a file path(ending in.json), the JSON Schema file will have that path.

## 8. Create a JSON Schema in the current working directory

Create a JSON Schema

```python
{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=45-48}
```

If you don't set `output` parameter the JSON Schema file will be created in the current working directory.

## Source Code for this Tutorial

Expand Down
33 changes: 29 additions & 4 deletions docs/tutorials/python/tutorial_scripts/schema_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,42 @@
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
OUTPUT_DIRECTORY = "./"
OUTPUT_DIRECTORY = "temp"

syn = Synapse()
syn.login()

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output_directory=OUTPUT_DIRECTORY,
data_type=DATA_TYPE,
data_model_labels="class_label",
output=OUTPUT_DIRECTORY,
data_types=DATA_TYPE,
synapse_client=syn,
)

print(schemas[0])

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output=OUTPUT_DIRECTORY,
data_types=["Patient", "Biospecimen"],
synapse_client=syn,
)

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output=OUTPUT_DIRECTORY,
synapse_client=syn,
)

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
data_types=DATA_TYPE,
output="test.json",
synapse_client=syn,
)

schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
data_types=DATA_TYPE,
synapse_client=syn,
)
51 changes: 51 additions & 0 deletions synapseclient/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
SynapseHTTPError,
SynapseNoCredentialsError,
)
from synapseclient.extensions.curator.schema_generation import generate_jsonschema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when someone doesn't have the extension package installed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be tested, however thinking about it I don't think that this would cause any runtime or static typing issues.

What I believe to happen is that due to the static typing checks I put in place this would be fine. The optional install of the curation extension is only for a few library dependencies for the curator code to work at runtime. The code like this generate_jsonschema function should always be available regardless if they "installed" the extension package, but then they use it they would get an error because pandas is not installed.

from synapseclient.wiki import Wiki

tracer = trace.get_tracer("synapseclient")
Expand Down Expand Up @@ -801,6 +802,17 @@ def migrate(args, syn):
result.as_csv(args.csv_log_path)


def generate_json_schema(args, syn):
"""Generate JSON schema for Synapse entity types"""
_, path = generate_jsonschema(
data_model_source=args.data_model_path,
output=args.output,
data_types=args.data_types,
data_model_labels=args.data_model_labels,
synapse_client=syn,
)


def build_parser():
"""Builds the argument parser and returns the result."""

Expand Down Expand Up @@ -1793,6 +1805,45 @@ def build_parser():
help="Bypass interactive prompt confirming migration",
)

parser_generate_json_schema = subparsers.add_parser(
"generate-json-schema", help="Generates a JSON Schema file from a data model."
)
parser_generate_json_schema.add_argument(
"data_model_path",
type=str,
help="Required path to CSV or JSONLD data model. Must be a path to a local file, or a URL.",
)
parser_generate_json_schema.add_argument(
"--data-types",
nargs="*",
type=str,
default=None,
help="Optional list of data types to generate schema for. If not provided, schema will be generated for all data types in the model.",
)
parser_generate_json_schema.add_argument(
"--output",
type=str,
default=None,
help=(
"Optional path. "
"If None, output file(s) will be created in the current working directory as ./<data-type>.json. "
"If a directory path, output file(s) will be created in the specified directory as <data-type>.json. "
"If a file path, schema will be written to the specified file. "
),
)
parser_generate_json_schema.add_argument(
"--data-model-labels",
type=str,
default="class_label",
choices=["class_label", "display_label"],
help=(
"Optional Label format for properties in the generated schema. "
"'class_label' uses standard attribute names (default). "
"'display_label' uses display names when valid"
),
)
parser_generate_json_schema.set_defaults(func=generate_json_schema)

parser_migrate.set_defaults(func=migrate)

return parser
Expand Down
Loading
Loading