Skip to content

Conversation

@andrewelamb
Copy link
Contributor

@andrewelamb andrewelamb commented Sep 30, 2025

SYNPY-1583

Problem:

The Schema Organization and JSON Schema functionality was not done in a similar way to the other Synapse entities(OOP model).

Solution:

This refactors the use of the JSON Schema API endpoints in a more consistent way.

@andrewelamb andrewelamb requested a review from a team as a code owner September 30, 2025 18:11
@andrewelamb andrewelamb marked this pull request as draft September 30, 2025 18:11
Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start on these changes! Please re-request review when you are further along and want me to take another peek.

@BryanFauble BryanFauble requested a review from a team September 30, 2025 21:44
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andrewelamb ! Thanks for your hard work. For this round of review, I mainly focused on checking if the examples work and also the organization of the code. I have not yet checked the format of the docstring. Here's a summary:

  1. I am thinking if list_json_schemas should become a method under SchemaOrganization like update_organization_acl? Currently, to list all the JSON schemas, we still have to do:
async def json_schema_list():
    org = SchemaOrganization("linglp.test.schemas")
    json_schemas = list_json_schemas("linglp.test.schemas", synapse_client=syn)
    async for schema in json_schemas:
        print(f"Schema: {schema}")

But I think we should be able to do: org.list_schemas() to list out all the json schemas in a given org.

  1. Would it make sense to follow our existing pattern of using fill_from_dict to fill API responses and remove from_response completely? Something like:
class JsonSchemaOrganization:
    def fill_from_dict(self, response: Dict[str, Any]) -> "JsonSchemaOrganization":
        """Converts a response from the REST API into this dataclass."""
        self.name = response.get("name")
        self.id = response.get("id")
        self.created_on = response.get("createdOn")
        self.created_by = response.get("createdBy")
        return self

org = JsonSchemaOrganization(name="my.org")
org.fill_from_dict(api_response)

And then you can test fill_from_dict in a unit test to ensure all the attributes get filled correctly.

  1. I think for consistencyget_async should always return either JSON Schemas or the organization, and not None.
    As an example, the following currently returns None:
async def get_organization_info():
    org = SchemaOrganization("linglp.test.schemas") # this organization actually exists 
    org.id = "12345"  # Set ID manually to a random string
    result = await org.get_async() 
    print(result)
asyncio.run(get_organization_info())

But I think I would want to reliably process the organization when I use this function in real world:

# This works sometimes, fails other times
org = SchemaOrganization("my.org.name")
result = await org.get_async()

# I don't want to do this: 
if result: 
    print(f"Got org: {result.name}")
else:
    print("Failed to get org") 

I want to make sure this always works:

org = SchemaOrganization("my.org.name")
result = await org.get_async()  # Always returns the org object
  1. I think all async examples need to be wrapped in async functions.

}
],
)
def test_from_response_with_exception(self, response: dict[str, Any]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Tests emphasize TypeError, but should we also cover functional validation, such as missing required fields, empty lists/objects, and bad values to ensure robust handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linglp I apologize, I missed this comment until now. Could you elaborate?

@andrewelamb
Copy link
Contributor Author

@linglp Thanks for the early feedback: I'll start addressing your comments below:

@andrewelamb
Copy link
Contributor Author

andrewelamb commented Oct 2, 2025

Hi @andrewelamb ! Thanks for your hard work. For this round of review, I mainly focused on checking if the examples work and also the organization of the code. I have not yet checked the format of the docstring. Here's a summary:

  1. I am thinking if list_json_schemas should become a method under SchemaOrganization like update_organization_acl? Currently, to list all the JSON schemas, we still have to do:
async def json_schema_list():
    org = SchemaOrganization("linglp.test.schemas")
    json_schemas = list_json_schemas("linglp.test.schemas", synapse_client=syn)
    async for schema in json_schemas:
        print(f"Schema: {schema}")

But I think we should be able to do: org.list_schemas() to list out all the json schemas in a given org.

SchemaOrganizaton has a method :

     async def get_json_schema_list_async(
        self, synapse_client: Optional["Synapse"] = None
    ) -> list["JSONSchema"]:
        """
        Gets the list of JSON Schemas that are part of this organization

        Returns: A list of JSONSchema objects

        Arguments:
            synapse_client: If not passed in and caching was not disabled by
                `Synapse.allow_client_caching(False)` this will use the last created
                instance from the Synapse class constructor

        Example: Get a list of JSONSchemas that belong to the SchemaOrganization
             

            ```python
            from synapseclient.models import SchemaOrganization
            from synapseclient import Synapse
            import asyncio

            syn = Synapse()
            syn.login()

            org = SchemaOrganization("dpetest")
            schemas = asyncio.run(org.get_json_schema_list_async())
            ```

        """
        response = list_json_schemas(self.name, synapse_client=synapse_client)
        schemas = []
        async for item in response:
            schemas.append(JSONSchema.from_response(item))
        return schemas

This gets all the JSONSchemas that are part of the org. Did you mean something different? The return will be changed to an AsyncGenerator shortly.

@andrewelamb
Copy link
Contributor Author

  1. Would it make sense to follow our existing pattern of using fill_from_dict to fill API responses and remove from_response completely? Something like:
class JsonSchemaOrganization:
    def fill_from_dict(self, response: Dict[str, Any]) -> "JsonSchemaOrganization":
        """Converts a response from the REST API into this dataclass."""
        self.name = response.get("name")
        self.id = response.get("id")
        self.created_on = response.get("createdOn")
        self.created_by = response.get("createdBy")
        return self

org = JsonSchemaOrganization(name="my.org")
org.fill_from_dict(api_response)

And then you can test fill_from_dict in a unit test to ensure all the attributes get filled correctly.

The reason I have it like it for situations like SchemaOrganization.get_json_schema_list_async:

   async def get_json_schema_list_async(
        self, synapse_client: Optional["Synapse"] = None
    ) -> list["JSONSchema"]:
        """
        Gets the list of JSON Schemas that are part of this organization

        Returns: A list of JSONSchema objects

        Arguments:
            synapse_client: If not passed in and caching was not disabled by
                `Synapse.allow_client_caching(False)` this will use the last created
                instance from the Synapse class constructor

        Example: Get a list of JSONSchemas that belong to the SchemaOrganization
             

            ```python
            from synapseclient.models import SchemaOrganization
            from synapseclient import Synapse
            import asyncio

            syn = Synapse()
            syn.login()

            org = SchemaOrganization("dpetest")
            schemas = asyncio.run(org.get_json_schema_list_async())
            ```

        """
        response = list_json_schemas(self.name, synapse_client=synapse_client)
        schemas = []
        async for item in response:
            schemas.append(JSONSchema.from_response(item))
        return schemas

JSONSchema requires a name and organization_name. If we were to use fill_from_dict like you suggest we would have to do something like:

 response = list_json_schemas(self.name, synapse_client=synapse_client)
        schemas = []
        async for item in response:
            schemas.append(JSONSchema("temp.name", "temp.org.name").fill_from_dict(item))
        return schemas

the from_response method is just a convenience function that handles the name the name and organization_name.

@andrewelamb andrewelamb changed the title JSON Schema OOP models [SYNPY-1583] JSON Schema OOP models Oct 17, 2025
@andrewelamb andrewelamb marked this pull request as ready for review October 17, 2025 18:08
Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but nothing major that I saw. Thanks for this work!

Copy link
Contributor

@SageGJ SageGJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick first pass of schema_organization.py, I can review the other files later but wanted to give you these comments sooner rather than later

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewelamb I tested out most parts, and most examples look good to me. Thanks for addressing my previous comments. But I think there's a bug when you get JSON Schema using JSONSchema.from_uri because it only gets you the latest version of the schema.

For testing, I created two versions of the schema:

test_org_name = "test.version.bug"
schema_name = "testVersionSchema"
schemas = JSONSchema(name=schema_name, organization_name=test_org_name)
print(schemas.get_body(version="4.0.0"))
print(schemas.get_body(version="1.0.0"))

And you should see two different versions being printed out which is correct.

But if I use from_uri, I got only version 4.0.0:

uri_v1 = f"{test_org_name}-{schema_name}-1.0.0"
schema_from_uri_v1 = JSONSchema.from_uri(uri_v1)

uri_v4 = f"{test_org_name}-{schema_name}-4.0.0"
schema_from_uri_v4 = JSONSchema.from_uri(uri_v4)

print(schema_from_uri_v1.get_body())
print(schema_from_uri_v4.get_body())

Also, when deleting a schema, assuming that you can do something like:

schema = JSONSchema(organization_name="my.org", name="test.schema")
schema.delete()

But what if I want to delete a specific version of the schema? How can I do that? Could you add an example?

syn = Synapse()
syn.login()

org = SchemaOrganization("my.org.name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the example here demonstrates how to store an organization. But we should demonstrate how to store a schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but that's done in the JSONSchema class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is under JSONSchemaProtocol, so the example should be related to JSONSchema right? I saw that get method has an example related to the JSONSchema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean. Thanks for catching that.

@andrewelamb
Copy link
Contributor Author

Thanks for checking this!

@andrewelamb I tested out most parts, and most examples look good to me. Thanks for addressing my previous comments. But I think there's a bug when you get JSON Schema using JSONSchema.from_uri because it only gets you the latest version of the schema.

For testing, I created two versions of the schema:

test_org_name = "test.version.bug"
schema_name = "testVersionSchema"
schemas = JSONSchema(name=schema_name, organization_name=test_org_name)
print(schemas.get_body(version="4.0.0"))
print(schemas.get_body(version="1.0.0"))

And you should see two different versions being printed out which is correct.

But if I use from_uri, I got only version 4.0.0:

uri_v1 = f"{test_org_name}-{schema_name}-1.0.0"
schema_from_uri_v1 = JSONSchema.from_uri(uri_v1)

uri_v4 = f"{test_org_name}-{schema_name}-4.0.0"
schema_from_uri_v4 = JSONSchema.from_uri(uri_v4)

print(schema_from_uri_v1.get_body())
print(schema_from_uri_v4.get_body())

This is not really a bug. What's happening is that the JSONSchema object doesn't have a specific version. If you look at the attributes, there's no semantic version. It represents the superset of ALL versions of that schema. So when you use the from_uri method with a semantic version such as org.name-schema.name-0.0.1 the semantic version is ignored.

uri_v1 = f"{test_org_name}-{schema_name}-1.0.0"
schema_from_uri_v1 = JSONSchema.from_uri(uri_v1)

uri_v4 = f"{test_org_name}-{schema_name}-4.0.0"
schema_from_uri_v4 = JSONSchema.from_uri(uri_v4)

Note that schema_from_uri_v1 and schema_from_uri_v4 will be the same thing, they both represent the JSONSchema entity in Synapse at f"{test_org_name}-{schema_name}" which has two versions [4.0.0, 1.0.0]

The problem you are running into is with the get_body method. When you use it without the version argument, it will default to getting the most recent version (4.0.0). The way to do it is:

uri = f"{test_org_name}-{schema_name}"
schema_from_uri = JSONSchema.from_uri(uri)
print(schema_from_uri.get_body(version="0.0.1"))
print(schema_from_uri.get_body(version="0.0.4"))

Also, when deleting a schema, assuming that you can do something like:

schema = JSONSchema(organization_name="my.org", name="test.schema")
schema.delete()

But what if I want to delete a specific version of the schema? How can I do that? Could you add an example?

Yes, thanks for catching this! The delete methods now have a version argument that allows for deleting specific versions.

@andrewelamb andrewelamb requested a review from linglp October 21, 2025 16:46
@andrewelamb andrewelamb requested a review from SageGJ October 21, 2025 17:26
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewelamb. LGTM! Thanks for addressing my previous comments.

If JSONSchema.from_uri("my.org-my.schema-0.0.4") returns a JSONSchema object, you still need to use veresion parameter to retrieve a specific version’s content, e.g.:

js = JSONSchema.from_uri("my.org-my.schema-0.0.4")
body = js.get_body(version="0.0.4")

This makes me wonder why we need from_uri at all. Wouldn’t it be simpler to do:

js = JSONSchema.from_uri(org="my.org", name="my.schema")
body = js.get_body(version="0.0.4")

@andrewelamb
Copy link
Contributor Author

@linglp You're correct that from_uri isn't strictly needed; it's a convenience method. It's simply to make life easier programmatically when you get a URI as an input. For example:

uri = some_function()

schema = JSONSchema.from_uri(uri)

as opposed to:

uri = some_function()
uri_parts = uri.split("-")
org_name = uri_parts[0]
name = uri_parts[1]
schema = JSONSchema(name, org_name)

When I was working on something for Curator using the current functionality, this would have made my life easier.

@andrewelamb andrewelamb merged commit 5fb94d4 into develop Oct 22, 2025
27 of 28 checks passed
@andrewelamb andrewelamb deleted the SYNPY-1583 branch October 22, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants