-
Notifications
You must be signed in to change notification settings - Fork 73
[SYNPY-1696] Add create JSON Schema CLI command #1294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| SynapseHTTPError, | ||
| SynapseNoCredentialsError, | ||
| ) | ||
| from synapseclient.extensions.curator.schema_generation import generate_jsonschema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when someone doesn't have the extension package installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be tested, however thinking about it I don't think that this would cause any runtime or static typing issues.
What I believe to happen is that due to the static typing checks I put in place this would be fine. The optional install of the curation extension is only for a few library dependencies for the curator code to work at runtime. The code like this generate_jsonschema function should always be available regardless if they "installed" the extension package, but then they use it they would get an error because pandas is not installed.
BryanFauble
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few changes to wrap up, but this is looking great otherwise!
BryanFauble
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested a last change, everything else looked good to me!
|
🔥 LGTM! Let's wait for @linglp to take a look too for knowledge transfer sake |
Co-authored-by: BryanFauble <[email protected]>
Co-authored-by: BryanFauble <[email protected]>
Update relationships
linglp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andrewelamb ! I think the output parameter would need to be more robust for relative paths and invalid directories such as foo.txt. Please see my comments below.
| schemas, file_paths = generator.generate_jsonschema( | ||
| data_model_labels=data_model_labels | ||
| ) | ||
| if output is not None and not output.endswith(".json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If output is invalid, such as foo.txt, it creates a directory literally named foo.txt and writes schemas into it like foo.txt/Patient.json. That seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic below would make more sense. You might want to make a separate function for this so that it can be tested better:
Determine output directory based on:
1. If output is an existing directory and the directory exists → use it
2. If output is a .json path → use its parent directory
3. Otherwise → use current directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also relative file paths don't seem to work:
synapse generate-json-schema /Users/lpeng/code/synapsePythonClient/tests/unit/synapseclient/extensions/schema_files/data_models/example.model.csv --data-types Patient --output ~/schemas/output.json
I am getting an error:
^
File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5519, in _write_data_model
export_json(json_doc=json_schema_dict, file_path=json_schema_path, indent=2)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lpeng/code/synapsePythonClient/synapseclient/extensions/curator/schema_generation.py", line 5395, in export_json
with open(file_path, "w", encoding="utf8") as fle:
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/lpeng/schemas/output.json'
I would expect that the schemas directory got created since it didn't exist and I could generate output.json under it, and it shouldn't error out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that the schemas directory got created since it didn't exist and I could generate
output.jsonunder it, and it shouldn't error out?
This was a use case I hadn't considered. I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
outputis invalid, such asfoo.txt, it creates a directory literally named foo.txt and writes schemas into it likefoo.txt/Patient.json. That seems like a bug.
So this is tough, if it created a directory foo.txt and wrote the JSON Schema there, then it is a valid directory name for the file system. There doesn't seem to be a Python function for checking if a string is a valid directory. So either it will create the file where the user asked, or it will raise an exception, so it doesn't seem like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test if this will work:
def validate_output_path(output: Optional[str], data_types: list[str]) -> list[Path]:
"""Only allow .json files or directories (no extension)."""
if output is None:
return [Path.cwd() / f"{dtype}.json" for dtype in data_types]
output_path = Path(output).expanduser()
extension = output_path.suffix # Gets ".txt", ".json", etc.
# Case 1: .json file
if extension == ".json":
if len(data_types) > 1:
raise ValueError(
f"Cannot write {len(data_types)} schemas to single file '{output}'"
)
output_path = output_path.resolve()
output_path.parent.mkdir(parents=True, exist_ok=True)
return [output_path]
# Case 2: Directory (no extension)s
if not extension:
output_path = output_path.resolve()
output_path.mkdir(parents=True, exist_ok=True)
return [output_path / f"{dtype}.json" for dtype in data_types]
# Case 3: Any other extension - REJECT
raise ValueError(
f"Invalid output path '{output}'. "
f"Extension '{extension}' is not allowed. "
f"Use either:\n"
f" - A .json file: 'output.json' (for single schema)\n"
f" - A directory: 'schemas' or 'schemas/' (for multiple schemas)"
)
validate_output_path("foo,txt", ["Patient"])
I tested some use cases and it seems to work on my end.
Problem:
The create JSON Schema command was not moved over to the Curator extension from Schematic. SYNPY-1696
Additionally, the existing code creates two copies of each JSON Schema file. SYNPY-1677
Finally, it was requested that users should be able to specify the exact path of the output file. SYNPY-1675
Solution:
MetadataModel,JsonSchemaGeneratorDirectorandJsonSchemaComponentGeneratorclasses were removed.outputparameter can take in either a path for the output JSON file (one datatype only) or a directory where all the output JSPN files will go.