[SYNPY-1696] Add create JSON Schema CLI command #1294

andrewelamb · 2025-12-17T21:09:03Z

Problem:

The create JSON Schema command was not moved over to the Curator extension from Schematic. SYNPY-1696
Additionally, the existing code creates two copies of each JSON Schema file. SYNPY-1677
Finally, it was requested that users should be able to specify the exact path of the output file. SYNPY-1675

Solution:

A CLI Command was added for creating JSON Schema.
The code for creating JSON Schemas was massively simplified. The MetadataModel, JsonSchemaGeneratorDirector and JsonSchemaComponentGenerator classes were removed.
The output parameter can take in either a path for the output JSON file (one datatype only) or a directory where all the output JSPN files will go.

thomasyu888 · 2025-12-30T17:50:16Z

synapseclient/__main__.py

    SynapseHTTPError,
    SynapseNoCredentialsError,
 )
+from synapseclient.extensions.curator.schema_generation import generate_jsonschema


What happens when someone doesn't have the extension package installed?

It should be tested, however thinking about it I don't think that this would cause any runtime or static typing issues.

What I believe to happen is that due to the static typing checks I put in place this would be fine. The optional install of the curation extension is only for a few library dependencies for the curator code to work at runtime. The code like this generate_jsonschema function should always be available regardless if they "installed" the extension package, but then they use it they would get an error because pandas is not installed.

docs/tutorials/python/tutorial_scripts/schema_operations.py

docs/tutorials/command_line_client.md

synapseclient/__main__.py

tests/unit/synapseclient/unit_test_commandline.py

BryanFauble

There are a few changes to wrap up, but this is looking great otherwise!

docs/tutorials/command_line_client.md

BryanFauble

I suggested a last change, everything else looked good to me!

thomasyu888 · 2025-12-30T19:08:49Z

🔥 LGTM! Let's wait for @linglp to take a look too for knowledge transfer sake

Co-authored-by: BryanFauble <[email protected]>

Update relationships

linglp

Hi @andrewelamb ! I think the output parameter would need to be more robust for relative paths and invalid directories such as foo.txt. Please see my comments below.

docs/tutorials/python/schema_operations.md

synapseclient/extensions/curator/schema_generation.py

linglp · 2025-12-31T19:33:12Z

synapseclient/extensions/curator/schema_generation.py

-    schemas, file_paths = generator.generate_jsonschema(
-        data_model_labels=data_model_labels
-    )
+    if output is not None and not output.endswith(".json"):


If output is invalid, such as foo.txt, it creates a directory literally named foo.txt and writes schemas into it like foo.txt/Patient.json. That seems like a bug.

I think the logic below would make more sense. You might want to make a separate function for this so that it can be tested better:

Determine output directory based on: 1. If output is an existing directory and the directory exists → use it 2. If output is a .json path → use its parent directory 3. Otherwise → use current directory

Also relative file paths don't seem to work:

synapse generate-json-schema /Users/lpeng/code/synapsePythonClient/tests/unit/synapseclient/extensions/schema_files/data_models/example.model.csv --data-types Patient --output ~/schemas/output.json

I am getting an error:

^ File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5519, in _write_data_model export_json(json_doc=json_schema_dict, file_path=json_schema_path, indent=2) ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5395, in export_json with open(file_path, "w", encoding="utf8") as fle: ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '/Users/lpeng/schemas/output.json'

I would expect that the schemas directory got created since it didn't exist and I could generate output.json under it, and it shouldn't error out?

I would expect that the schemas directory got created since it didn't exist and I could generate output.json under it, and it shouldn't error out?

This was a use case I hadn't considered. I'll add it.

If output is invalid, such as foo.txt, it creates a directory literally named foo.txt and writes schemas into it like foo.txt/Patient.json. That seems like a bug.

So this is tough, if it created a directory foo.txt and wrote the JSON Schema there, then it is a valid directory name for the file system. There doesn't seem to be a Python function for checking if a string is a valid directory. So either it will create the file where the user asked, or it will raise an exception, so it doesn't seem like a bug to me.

Could you test if this will work:

def validate_output_path(output: Optional[str], data_types: list[str]) -> list[Path]: """Only allow .json files or directories (no extension).""" if output is None: return [Path.cwd() / f"{dtype}.json" for dtype in data_types] output_path = Path(output).expanduser() extension = output_path.suffix # Gets ".txt", ".json", etc. # Case 1: .json file if extension == ".json": if len(data_types) > 1: raise ValueError( f"Cannot write {len(data_types)} schemas to single file '{output}'" ) output_path = output_path.resolve() output_path.parent.mkdir(parents=True, exist_ok=True) return [output_path] # Case 2: Directory (no extension)s if not extension: output_path = output_path.resolve() output_path.mkdir(parents=True, exist_ok=True) return [output_path / f"{dtype}.json" for dtype in data_types] # Case 3: Any other extension - REJECT raise ValueError( f"Invalid output path '{output}'. " f"Extension '{extension}' is not allowed. " f"Use either:\n" f" - A .json file: 'output.json' (for single schema)\n" f" - A directory: 'schemas' or 'schemas/' (for multiple schemas)" ) validate_output_path("foo,txt", ["Patient"])

I tested some use cases and it seems to work on my end.

synapseclient/extensions/curator/schema_generation.py

andrewelamb added 3 commits December 16, 2025 14:05

added generate json schema command

59a0455

added tests

671bf21

add documentation

2b8ee3f

andrewelamb requested a review from a team as a code owner December 17, 2025 21:09

andrewelamb marked this pull request as draft December 17, 2025 21:09

andrewelamb added 2 commits December 17, 2025 13:11

remove uneeded classes

bfd1711

add value error

0b97f3b

andrewelamb changed the title ~~Synpy 1696~~ [SYNPY-1696] Add create JSON Schema CLI command Dec 30, 2025

handle merge conflict

2d5a882

andrewelamb marked this pull request as ready for review December 30, 2025 17:27

add CLI tests using urls

78b7d14

andrewelamb requested review from BryanFauble and linglp December 30, 2025 17:36

thomasyu888 reviewed Dec 30, 2025

View reviewed changes

BryanFauble reviewed Dec 30, 2025

View reviewed changes

docs/tutorials/python/tutorial_scripts/schema_operations.py Show resolved Hide resolved

BryanFauble reviewed Dec 30, 2025

View reviewed changes

docs/tutorials/command_line_client.md Show resolved Hide resolved

BryanFauble reviewed Dec 30, 2025

View reviewed changes

synapseclient/__main__.py Show resolved Hide resolved

BryanFauble reviewed Dec 30, 2025

View reviewed changes

tests/unit/synapseclient/unit_test_commandline.py Outdated Show resolved Hide resolved

BryanFauble requested changes Dec 30, 2025

View reviewed changes

andrewelamb added 4 commits December 30, 2025 10:07

move URL tests to integration file

369ffcf

added info.logging statement for file paths

81e5156

fix command documentation

9f17ecc

add comments to tutorial script

a25ec2e

andrewelamb requested a review from BryanFauble December 30, 2025 18:32

fix issue with no outoput and not datatypes

a6de851

BryanFauble reviewed Dec 30, 2025

View reviewed changes

docs/tutorials/command_line_client.md Outdated Show resolved Hide resolved

BryanFauble approved these changes Dec 30, 2025

View reviewed changes

andrewelamb and others added 2 commits December 30, 2025 11:09

Update docs/tutorials/command_line_client.md

7b6b7d1

Co-authored-by: BryanFauble <[email protected]>

update data model documentation

15ba0d3

andrewelamb and others added 7 commits December 30, 2025 12:15

add new minimal data model and tests

44e58e2

Update docs/explanations/curator_data_model.md

8caa98f

Co-authored-by: BryanFauble <[email protected]>

ran pre-commit

04336df

add links to columns

e0294c6

add note to columnType

076e521

rearange notes

2158fdb

Merge pull request #1298 from Sage-Bionetworks/update_relationships

0d7725b

Update relationships

linglp reviewed Dec 31, 2025

View reviewed changes

andrewelamb added 6 commits December 31, 2025 12:15

improve error message

0988bbe

add use case when JSON Schema path is provided but the dir doesnt exist

37c9bec

clean up docstring

4311284

clean up tutorial verbage

3b49b10

fix dirname logic

f1667e0

fix docstring example

3857b74

[SYNPY-1696] Add create JSON Schema CLI command #1294

Are you sure you want to change the base?

[SYNPY-1696] Add create JSON Schema CLI command #1294

Uh oh!

Conversation

andrewelamb commented Dec 17, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BryanFauble left a comment

Choose a reason for hiding this comment

Uh oh!

thomasyu888 commented Dec 30, 2025

Uh oh!

linglp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewelamb commented Dec 17, 2025 •

edited by atlassian bot

Loading