Skip to content

Conversation

@andrewelamb
Copy link
Contributor

@andrewelamb andrewelamb commented Dec 17, 2025

Problem:

The create JSON Schema command was not moved over to the Curator extension from Schematic. SYNPY-1696
Additionally, the existing code creates two copies of each JSON Schema file. SYNPY-1677
Finally, it was requested that users should be able to specify the exact path of the output file. SYNPY-1675

Solution:

  • A CLI Command was added for creating JSON Schema.
  • The code for creating JSON Schemas was massively simplified. The MetadataModel, JsonSchemaGeneratorDirector and JsonSchemaComponentGenerator classes were removed.
  • The output parameter can take in either a path for the output JSON file (one datatype only) or a directory where all the output JSPN files will go.

@andrewelamb andrewelamb requested a review from a team as a code owner December 17, 2025 21:09
@andrewelamb andrewelamb marked this pull request as draft December 17, 2025 21:09
@andrewelamb andrewelamb changed the title Synpy 1696 [SYNPY-1696] Add create JSON Schema CLI command Dec 30, 2025
@andrewelamb andrewelamb marked this pull request as ready for review December 30, 2025 17:27
SynapseHTTPError,
SynapseNoCredentialsError,
)
from synapseclient.extensions.curator.schema_generation import generate_jsonschema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when someone doesn't have the extension package installed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be tested, however thinking about it I don't think that this would cause any runtime or static typing issues.

What I believe to happen is that due to the static typing checks I put in place this would be fine. The optional install of the curation extension is only for a few library dependencies for the curator code to work at runtime. The code like this generate_jsonschema function should always be available regardless if they "installed" the extension package, but then they use it they would get an error because pandas is not installed.

Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few changes to wrap up, but this is looking great otherwise!

Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a last change, everything else looked good to me!

@thomasyu888
Copy link
Member

🔥 LGTM! Let's wait for @linglp to take a look too for knowledge transfer sake

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andrewelamb ! I think the output parameter would need to be more robust for relative paths and invalid directories such as foo.txt. Please see my comments below.

schemas, file_paths = generator.generate_jsonschema(
data_model_labels=data_model_labels
)
if output is not None and not output.endswith(".json"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If output is invalid, such as foo.txt, it creates a directory literally named foo.txt and writes schemas into it like foo.txt/Patient.json. That seems like a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic below would make more sense. You might want to make a separate function for this so that it can be tested better:

Determine output directory based on:
    1. If output is an existing directory and the directory exists → use it
    2. If output is a .json path → use its parent directory  
    3. Otherwise → use current directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also relative file paths don't seem to work:

synapse generate-json-schema /Users/lpeng/code/synapsePythonClient/tests/unit/synapseclient/extensions/schema_files/data_models/example.model.csv --data-types Patient --output ~/schemas/output.json

I am getting an error:

    ^
  File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5519, in _write_data_model
    export_json(json_doc=json_schema_dict, file_path=json_schema_path, indent=2)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5395, in export_json
    with open(file_path, "w", encoding="utf8") as fle:
         ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/lpeng/schemas/output.json'

I would expect that the schemas directory got created since it didn't exist and I could generate output.json under it, and it shouldn't error out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the schemas directory got created since it didn't exist and I could generate output.json under it, and it shouldn't error out?

This was a use case I hadn't considered. I'll add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If output is invalid, such as foo.txt, it creates a directory literally named foo.txt and writes schemas into it like foo.txt/Patient.json. That seems like a bug.

So this is tough, if it created a directory foo.txt and wrote the JSON Schema there, then it is a valid directory name for the file system. There doesn't seem to be a Python function for checking if a string is a valid directory. So either it will create the file where the user asked, or it will raise an exception, so it doesn't seem like a bug to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test if this will work:

def validate_output_path(output: Optional[str], data_types: list[str]) -> list[Path]:
    """Only allow .json files or directories (no extension)."""
    
    if output is None:
        return [Path.cwd() / f"{dtype}.json" for dtype in data_types]
    
    output_path = Path(output).expanduser()
    extension = output_path.suffix  # Gets ".txt", ".json", etc.
    
    # Case 1: .json file
    if extension == ".json":
        if len(data_types) > 1:
            raise ValueError(
                f"Cannot write {len(data_types)} schemas to single file '{output}'"
            )
        output_path = output_path.resolve()
        output_path.parent.mkdir(parents=True, exist_ok=True)
        return [output_path]
    
    # Case 2: Directory (no extension)s
    if not extension:
        output_path = output_path.resolve()
        output_path.mkdir(parents=True, exist_ok=True)
        return [output_path / f"{dtype}.json" for dtype in data_types]

    # Case 3: Any other extension - REJECT
    raise ValueError(
        f"Invalid output path '{output}'. "
        f"Extension '{extension}' is not allowed. "
        f"Use either:\n"
        f"  - A .json file: 'output.json' (for single schema)\n"
        f"  - A directory: 'schemas' or 'schemas/' (for multiple schemas)"
    )

    

validate_output_path("foo,txt", ["Patient"])

I tested some use cases and it seems to work on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants